University of Cambridge > Talks.cam > Natural Language Processing Reading Group > Bayesian Word Sense Induction

Bayesian Word Sense Induction

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Diarmuid Ó Séaghdha.

At this session of the NLIP Reading Group we’ll be discussing the following paper:

Samuel Brody. 2009. Bayesian word sense induction. In Proceedings of EACL -09.

Abstract: Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. The Bayesian framework provides a principled way to incorporate a wide range of features beyond lexical co-occurrences and to systematically assess their utility on the sense induction task. The proposed approach yields improvements over state-of-the-art systems on a benchmark dataset.

Like some work presented at recent *ACLs, it builds on the Latent Dirichlet Allocation model (a.k.a. the standard “topic model”). For a more thorough introduction to the latter, the following paper is recommended:

Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101: 5228-5235.

This talk is part of the Natural Language Processing Reading Group series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity