Bayesian Word Sense Induction
- 👤 Speaker: Andreas Vlachos (University of Cambridge)
- 📅 Date & Time: Monday 26 October 2009, 12:30 - 13:30
- 📍 Venue: GS15, Computer Laboratory
Abstract
At this session of the NLIP Reading Group we’ll be discussing the following paper:
Samuel Brody. 2009. Bayesian word sense induction. In Proceedings of EACL -09.
Abstract: Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. The Bayesian framework provides a principled way to incorporate a wide range of features beyond lexical co-occurrences and to systematically assess their utility on the sense induction task. The proposed approach yields improvements over state-of-the-art systems on a benchmark dataset.
Like some work presented at recent *ACLs, it builds on the Latent Dirichlet Allocation model (a.k.a. the standard “topic model”). For a more thorough introduction to the latter, the following paper is recommended:
Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101: 5228-5235.
Series This talk is part of the Natural Language Processing Reading Group series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GS15, Computer Laboratory
- Guy Emerson's list
- Natural Language Processing Reading Group
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Monday 26 October 2009, 12:30-13:30