BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Term Weighting Schemes for Latent Dirichlet Allocation - James Jar
 dine\, Computer Laboratory\, Cambridge
DTSTART:20101108T123000Z
DTEND:20101108T133000Z
UID:TALK27891@talks.cam.ac.uk
CONTACT:Jimme Jardine
DESCRIPTION:Hi all.  I will be presenting a paper on LDA.  \n\n@conference
 {wilson2010term\,\n  title={{Term Weighting Schemes for Latent Dirichlet A
 llocation}}\,\n  author={Wilson\, A.T. and Chew\, P.A.}\,\n  booktitle={Hu
 man Language Technologies: The 2010 Annual Conference of the North America
 n Chapter of the Association for Computational Linguistics}\,\n  pages={46
 5--473}\,\n  year={2010}\,\n  organization={Association for Computational 
 Linguistics}\n}\n\nMany implementations of Latent Dirichlet Al- location (
 LDA)\, including those described in Blei et al. (2003)\, rely at some poin
 t on the removal of stopwords\, words which are as- sumed to contribute li
 ttle to the meaning of the text. This step is considered necessary be- cau
 se otherwise high-frequency words tend to end up scattered across many of 
 the latent top- ics without much rhyme or reason. We show\, however\, that
  the ‘problem’ of high-frequency words can be dealt with more elegantl
 y\, and in a way that to our knowledge has not been considered in LDA\, th
 rough the use of appro- priate weighting schemes comparable to those somet
 imes used in Latent Semantic Indexing (LSI). Our proposed weighting method
 s not only make theoretical sense\, but can also be shown to improve preci
 sion significantly on a non-trivial cross-language retrieval task.
LOCATION:GS15\, Computer Laboratory
END:VEVENT
END:VCALENDAR
