Two Approaches to Grammar Induction: From Plain Text to Semantic Supervision
- 👤 Speaker: Omri Abend, The Hebrew University of Jerusalem
- 📅 Date & Time: Wednesday 06 March 2013, 12:00 - 13:00
- 📍 Venue: FW11, Computer Laboratory
Abstract
Grammatical representation is in wide use in a variety of NLP tasks such as Machine Translation and Textual Entailment. Semi-supervised and unsupervised approaches to grammar induction are increasingly being used, and offer applicative and theoretical advantages over their supervised counterparts. However, it is still an open question what type of input is sufficient to learn grammar in all its complexity. This question is directly related to one of the central questions in language science: whether and how language can be acquired from experience. I will discuss two approaches to this question that I have been pursuing in my research.
The fully unsupervised approach uses only plain text as input, treating language as an ordered collection of semantically-void symbols. I will present a number of works that effectively apply this minimalist approach to tasks at the syntax-semantics interface, such as argument identification and the classification of verbal arguments to cores and adjuncts.
However, despite their appeal, performance of unsupervised models still lags considerably behind the state of the art. In the second part of the talk I will discuss a complementary approach to grammar induction, which also assumes semantic corpus annotation as input. Concretely, I will present UCCA (Universal Conceptual Cognitive Analysis)—a novel scheme that provides a formal framework for semantic representation in general and for grammar induction in particular. UCCA covers many of the most important elements and relations present in linguistic utterances, including the argument structure of various types of predicates and the linkage between them, but confines itself to semantic distinctions. For instance, UCCA represents the similarity between “John made an appearance” and “John appeared”, disregarding their syntactic differences. I will also touch on our current efforts for constructing a UCCA parser and for applying UCCA to statistical machine translation. A UCCA -annotated corpus will be released during 2013.
Joint work with Ari Rappoport.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- FW11, Computer Laboratory
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Omri Abend, The Hebrew University of Jerusalem
Wednesday 06 March 2013, 12:00-13:00