University of Cambridge > Talks.cam > RCEAL Tuesday Colloquia > Appellation d'Origine Contrôlée - Language variability: a major challenge for natural language applications

Appellation d'Origine Contrôlée - Language variability: a major challenge for natural language applications

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Susan Rolfe.

Many Natural Language Processing (NLP) applications are challenged by the great variability of language – the fact that the same word may refer to various things and that the same idea could be expressed in a variety of ways. This talk will focus on language variability and, in particular, on how existing NLP techniques handle it.

I will first focus on named entities, i.e. sequences of text corresponding to person names, location names, dates, currencies, etc. Named entities are important for text processing since they are good indicators of the content of texts and can serve as a basis for deeper analysis. They are typically considered as “rigid designators”, unambiguously referring to a single, stable entity in the world. I will show that this assumption is not always correct; rather, the meaning of a named entity can be affected by context. I will illustrate this with the case of metonymy and show that although metonymy is a relatively well-understood linguistic phenomenon, it is difficult to analyse it using a fully automatic approach.

I will then focus on discourse processing, on a task which aims to automatically structure free text according to a set of semantic principles. Automatic discourse analysis is challenging since it requires considering multiple linguistic cues and their interaction in complex patterns. These patterns may include conflicting information among which the parser has to choose. I will present a framework particularly designed to choose an optimal solution from a range of complex, interacting constraints that sometimes contradict themselves. This approach is implemented and evaluated for Health Practices Guidelines (i.e. short documents describing the practices that physicians should follow).

In the conclusion, I will discuss the dependency of computational approaches on language usage: the meaning of a linguistic item is largely dependent on context, and the context is difficult model in advance.

This talk is part of the RCEAL Tuesday Colloquia series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2017 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity