University of Cambridge > Talks.cam > NLIP Seminar Series > Two Approaches to Grammar Induction: From Plain Text to Semantic Supervision

Two Approaches to Grammar Induction: From Plain Text to Semantic Supervision

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Ekaterina Kochmar.

Grammatical representation is in wide use in a variety of NLP tasks such as Machine Translation and Textual Entailment. Semi-supervised and unsupervised approaches to grammar induction are increasingly being used, and offer applicative and theoretical advantages over their supervised counterparts. However, it is still an open question what type of input is sufficient to learn grammar in all its complexity. This question is directly related to one of the central questions in language science: whether and how language can be acquired from experience. I will discuss two approaches to this question that I have been pursuing in my research.

The fully unsupervised approach uses only plain text as input, treating language as an ordered collection of semantically-void symbols. I will present a number of works that effectively apply this minimalist approach to tasks at the syntax-semantics interface, such as argument identification and the classification of verbal arguments to cores and adjuncts.

However, despite their appeal, performance of unsupervised models still lags considerably behind the state of the art. In the second part of the talk I will discuss a complementary approach to grammar induction, which also assumes semantic corpus annotation as input. Concretely, I will present UCCA (Universal Conceptual Cognitive Analysis)—a novel scheme that provides a formal framework for semantic representation in general and for grammar induction in particular. UCCA covers many of the most important elements and relations present in linguistic utterances, including the argument structure of various types of predicates and the linkage between them, but confines itself to semantic distinctions. For instance, UCCA represents the similarity between “John made an appearance” and “John appeared”, disregarding their syntactic differences. I will also touch on our current efforts for constructing a UCCA parser and for applying UCCA to statistical machine translation. A UCCA -annotated corpus will be released during 2013.

Joint work with Ari Rappoport.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2023 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity