University of Cambridge > > Churchill CompSci Talks > Text Data Mining using Topic Modeling

Text Data Mining using Topic Modeling

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Matthew Ireland.

Room changed: club room

As we gather more and more data, it is becoming increasingly difficult to find the information that we need. However, text data mining tools can provide us with ways of organizing all this information in a useful and accessible way. In particular, discovering the patterns in a document using topic modeling can help us annotate and search through documents based on their themes. My talk will present how the Latent Dirichlet Allocation performs the task of extracting a certain number of topics from a document by utilising a probabilistic model which assumes that each document is arising from a generative process. Furthermore, we shall also investigate how a Bayesian nonparametric model, namely the Chinese Restaurant Process, can be employed when the number of topics in a document is not known in advance. Finally, we shall see how topic hierarchies can be built by exploiting the Nested Chinese Restaurant Process.

This talk is part of the Churchill CompSci Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2018, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity