Text Data Mining using Topic Modeling
- đ¤ Speaker: Ioana Bica, Churchill College
- đ Date & Time: Wednesday 02 November 2016, 19:00 - 19:30
- đ Venue: Club Room, Churchill College
Abstract
As we gather more and more data, it is becoming increasingly difficult to find the information that we need. However, text data mining tools can provide us with ways of organizing all this information in a useful and accessible way. In particular, discovering the patterns in a document using topic modeling can help us annotate and search through documents based on their themes. My talk will present how the Latent Dirichlet Allocation performs the task of extracting a certain number of topics from a document by utilising a probabilistic model which assumes that each document is arising from a generative process. Furthermore, we shall also investigate how a Bayesian nonparametric model, namely the Chinese Restaurant Process, can be employed when the number of topics in a document is not known in advance. Finally, we shall see how topic hierarchies can be built by exploiting the Nested Chinese Restaurant Process.
Series This talk is part of the Churchill CompSci Talks series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Churchill CompSci Talks
- Club Room, Churchill College
- computer science
- Interested Talks
- ndk22's list
- ob366-ai4er
- rp587
- se393's list
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 02 November 2016, 19:00-19:30