University of Cambridge > > Language Technology Lab Seminars > Scalable Non-Markovian Language Modelling

Scalable Non-Markovian Language Modelling

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dimitri Kartsaklis.

Markov models are popular means of modeling the underlying structure of natural language, which is naturally represented as sequences and trees. The locality assumption made in low-order Markov models such as n-gram language models is limiting, because if the data generation process exhibits long range dependencies, modeling the distribution well requires consideration of long range context. On the other hand, higher-order Markov, or infinite-order Non-Markovian (infinite-order Markov) models, exhibit computational complexity and statistical challenges during learning and inference. In particular, under the large data setting their exponential number of parameters often results in estimation and sampler mixing issues, while representing the structure of the model, and sufficient statistics or sampler states can quickly become computationally inefficient and impractical.

We propose a framework based on compressed data structures which keeps the memory usage of modeling, learning, and inference steps independent from the order of the models. Our approach scales nicely with the order of the Markov model and data size, and is highly competitive with the state-of-the-art in terms of the memory and runtime, while allowing us to develop Bayesian and non-Bayesian smoothing techniques. Using our compressed framework to represent the models, we explore its scalability under two Non-Markovian language modeling settings, using large scale data and infinite context.

First, we model the Kneser-Ney family of language models and illustrate that our approach is several orders of magnitude more memory efficient than the state-of-the-art, in training and testing, while it is highly competitive in terms of run-times of both phases. When memory is a limiting factor at query time, our approach is orders of magnitude faster than the state-of-the-art. We then turn to Hierarchical Nonparametric Bayesian language modeling, and develop efficient sampling mechanism which allows us to prevent the sampler mixing issue, common in large Bayesian models. More precisely, compared with the previous stat-of-the-art hierarchical Bayesian language model, the experimental results illustrate that our model can be built on 100x larger datasets, while being several orders of magnitude smaller, fast for training and inference, and outperforming the perplexity of the state-of-the-art Modified Kneser-Ney LM by up to 15%.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2019, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity