reading-group: Interpolating Between Types and Tokens by Estimating Power-Law Generators
- 👤 Speaker: Speaker to be confirmed
- 📅 Date & Time: Thursday 19 January 2006, 11:00 - 12:00
- 📍 Venue: Room 911, Rutherford Building, Cavendish Laboratory, Department of Physics
Abstract
http://cog.brown.edu/~gruffydd/papers/typetoken.pdf
Paper-abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process the Pitman-Yor process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
Series This talk is part of the Machine Learning Journal Club series.
Included in Lists
- Cambridge talks
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Interested Talks
- Machine Learning Journal Club
- Machine Learning Summary
- ML
- Quantum Matter Journal Club
- Room 911, Rutherford Building, Cavendish Laboratory, Department of Physics
- rp587
- TQS Journal Clubs
- yk373's list
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Speaker to be confirmed
Thursday 19 January 2006, 11:00-12:00