Nonparametric Bayesian Natural Language Model Domain Adaptation: A Hierarchical, Hierarchical Pitman-Yor Process Language Model
- π€ Speaker: Dr Frank Wood (UCL)
- π Date & Time: Wednesday 17 September 2008, 14:00 - 15:00
- π Venue: Engineering Department, CBL Room 438
Abstract
There are many real-world modeling problems for which one may not have a sufficient quantity of training data to reliably estimate a useful model. Obtaining sufficient quantities of training data for these “specific” modeling domains can be both costly as well as a significant logistical challenge. In some cases there may already exist a large quantity of training data from a related or more general domain. The phrase domain adaptation is used to describe modeling techniques that utilize such data (copious but general) to improve modeling of specific domains for which training data is not as readily available. Various language model domain adaptation approaches have been proposed; however, this work is the first to show how to do domain adaptation of hierarchical nonparametric Bayesian language models.
Specifically we define a hierarchy of hierarchical Pitman-Yor process language model and explain how such a model accomplishes domain adaptation (intuitively, it “backs-off” to both in- and out-of-domain models in a way that is similar in spirit to the backing-off that smoothed n-gram models do within a single domain). For estimation and inference we define a novel multi-floor Chinese restaurant franchise representation and sampler. Encouragingly, for various natural language corpora we find that our new approach to domain adaptation outperforms all of the existing approaches against which it was compared.
Series This talk is part of the Machine Learning @ CUED series.
Included in Lists
- All Talks (aka the CURE list)
- Biology
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge Neuroscience Seminars
- Cambridge talks
- CBL important
- Chris Davis' list
- Creating transparent intact animal organs for high-resolution 3D deep-tissue imaging
- dh539
- dh539
- Engineering Department, CBL Room 438
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Joint Machine Learning Seminars
- Life Science
- Life Sciences
- Machine Learning @ CUED
- Machine Learning Summary
- ML
- ndk22's list
- Neuroscience
- Neuroscience Seminars
- Neuroscience Seminars
- ob366-ai4er
- Required lists for MLG
- rp587
- Seminar
- Simon Baker's List
- Stem Cells & Regenerative Medicine
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 17 September 2008, 14:00-15:00