Statistical Investigations into the Unseen: Missing Mass for Markov Samples and Natural Distribution Estimation
- đ¤ Speaker: Prof. Andrew Thangaraj, Indian Institute of Technology Madras đ Website
- đ Date & Time: Monday 10 November 2025, 14:00 - 15:00
- đ Venue: Cambridge University Engineering Department, JDB Seminar Room
Abstract
Suppose we observe a sequence of samples from a very large alphabet and the number of samples is comparable or lesser than the alphabet size. Several letters from the alphabet will be unseen or missing in the observed samples. What can be inferred about the distribution’s probability mass on the missing letters? The sum of the probability masses on all missing letters is called missing mass, and the classical Good-Turing (GT) estimator is minimax optimal over all distributions and alphabet sizes when the samples are iid. However, when the samples are Markovian sequences, the GT estimator fails. In this talk, we will introduce a windowed version of the GT estimator and show that, when the window size is sufficiently larger than the mixing time, the windowed GT estimator is nearly minimax optimal. Going beyond missing mass, we will present the generalization to higher-order missing mass and missing g-mass, which can potentially quantify the distance of the missing part of the distribution from uniformity. We will conclude with some extensions of these results to the distribution’s probability mass on sparsely observed letters and potential impact on distribution estimation.
Series This talk is part of the Probabilistic Systems, Information, and Inference Group Seminars series.
Included in Lists
- All CMS events
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Cambridge University Engineering Department, JDB Seminar Room
- Cambridge University Engineering Department Talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- CMS Events
- Computational Continuum Mechanics Group Seminars
- DPMMS info aggregator
- DPMMS lists
- DPMMS Lists
- Featured lists
- Hanchen DaDaDash
- Information Engineering Division seminar list
- Information Theory Seminar
- Interested Talks
- ndk22's list
- ob366-ai4er
- Probabilistic Systems, Information, and Inference Group Seminars
- rp587
- School of Physical Sciences
- School of Technology
- Statistical Laboratory info aggregator
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Prof. Andrew Thangaraj, Indian Institute of Technology Madras 
Monday 10 November 2025, 14:00-15:00