University of Cambridge > Talks.cam > Machine Learning @ CUED > Nonparametric Bayesian Natural Language Model Domain Adaptation: A Hierarchical, Hierarchical Pitman-Yor Process Language Model

Nonparametric Bayesian Natural Language Model Domain Adaptation: A Hierarchical, Hierarchical Pitman-Yor Process Language Model

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Zoubin Ghahramani.

There are many real-world modeling problems for which one may not have a sufficient quantity of training data to reliably estimate a useful model. Obtaining sufficient quantities of training data for these “specific” modeling domains can be both costly as well as a significant logistical challenge. In some cases there may already exist a large quantity of training data from a related or more general domain. The phrase domain adaptation is used to describe modeling techniques that utilize such data (copious but general) to improve modeling of specific domains for which training data is not as readily available. Various language model domain adaptation approaches have been proposed; however, this work is the first to show how to do domain adaptation of hierarchical nonparametric Bayesian language models.

Specifically we define a hierarchy of hierarchical Pitman-Yor process language model and explain how such a model accomplishes domain adaptation (intuitively, it “backs-off” to both in- and out-of-domain models in a way that is similar in spirit to the backing-off that smoothed n-gram models do within a single domain). For estimation and inference we define a novel multi-floor Chinese restaurant franchise representation and sampler. Encouragingly, for various natural language corpora we find that our new approach to domain adaptation outperforms all of the existing approaches against which it was compared.

This talk is part of the Machine Learning @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2020 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity