Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

A Polya Urn Document Language Model for Information Retrieval

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Tamara Polajnar.

Although the multinomial language model has been one of the most effective unigram models of information retrieval for over a decade, it does not model one important linguistic phenomenon relating to term-dependency; namely the tendency of a term to repeat itself within a document (i.e. word burstiness).

In this talk I will begin with a brief review of language modelling as applied to information retrieval. I will then present some work near completion in which we model document generation as a random process with reinforcement (a multivariate Polya process) and develop a Dirichlet compound multinomial language model that captures word burstiness. I will show that the new reinforced language model can be computed as efficiently as current retrieval models and that it significantly outperforms the multinomial model for a number of standard effectiveness metrics. I will conclude by presenting an analysis of the retrieval method which shows that it adheres to what is called the “verbosity hypothesis” and will show that the method essentially combines the term and document event spaces giving theoretical justification to tf-idf type schemes.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

A Polya Urn Document Language Model for Information Retrieval

This talk is included in these lists:

Other lists

Other talks