University of Cambridge > Talks.cam > Machine Learning @ CUED > Learning via Data Compression: Bayesian Coresets and Sparse Variational Inference

Learning via Data Compression: Bayesian Coresets and Sparse Variational Inference

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Robert Peharz.

We have reached a point in many fields of science and technology where we create data at a pace that far outstrips our capacity to process it. While a boon from a statistical perspective, this wealth of data presents a computational challenge: how might we design a model-based inference system that learns forever, retains important past information, doesn’t get bogged down by a persistent stream of new data, and makes inferences with guaranteed statistical quality? The human nervous system provides inspiration; to handle the astounding amount of perceptual data it constantly receives, the nervous system filters and compresses the data significantly before passing it along to the brain where learning occurs. Although a seemingly simple solution, it does raise interesting questions for the design of a computational inference system: how should we decide what data to retain, how should we compress it, and what degree of compression should we apply before learning from it?

This talk will cover recent work on Bayesian coresets (“core of a dataset”), a methodology for statistical inference via data compression. Coresets achieve compression by forming a small weighted subset of data that replaces the full dataset during inference, leading to significant computational gains with provably minimal loss in inferential quality. In particular, the talk will present numerous methods for Bayesian coreset construction, from previously-developed subsampling, greedy, and sparse linear regression-based techniques to novel algorithms based on sparse variational inference (VI). In contrast to past algorithms, the sparse VI-based algorithms are fully automated, requiring only the dataset and probabilistic model specification as inputs. The talk will additionally provide a unifying view and statistical analysis of these methods using the theory of exponential families and Riemannian information geometry. The talk will conclude with empirical results showing that despite requiring much less user input than past methods, sparse VI coreset construction provides state-of-the-art data summarization for Bayesian inference.

This talk is part of the Machine Learning @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity