Term Weighting Schemes for Latent Dirichlet Allocation
- π€ Speaker: James Jardine, Computer Laboratory, Cambridge
- π Date & Time: Monday 08 November 2010, 12:30 - 13:30
- π Venue: GS15, Computer Laboratory
Abstract
Hi all. I will be presenting a paper on LDA .
@conference{wilson2010term, title={{Term Weighting Schemes for Latent Dirichlet Allocation}}, author={Wilson, A.T. and Chew, P.A.}, booktitle={Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, pages={465—473}, year={2010}, organization={Association for Computational Linguistics} }
Many implementations of Latent Dirichlet Al- location (LDA), including those described in Blei et al. (2003), rely at some point on the removal of stopwords, words which are as- sumed to contribute little to the meaning of the text. This step is considered necessary be- cause otherwise high-frequency words tend to end up scattered across many of the latent top- ics without much rhyme or reason. We show, however, that the βproblemβ of high-frequency words can be dealt with more elegantly, and in a way that to our knowledge has not been considered in LDA , through the use of appro- priate weighting schemes comparable to those sometimes used in Latent Semantic Indexing (LSI). Our proposed weighting methods not only make theoretical sense, but can also be shown to improve precision significantly on a non-trivial cross-language retrieval task.
Series This talk is part of the Natural Language Processing Reading Group series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GS15, Computer Laboratory
- Guy Emerson's list
- Natural Language Processing Reading Group
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Monday 08 November 2010, 12:30-13:30