University of Cambridge > Talks.cam > Computer Laboratory Research Students' Lectures 2014 > Applied Probabilistic Algorithms for Big Data Analysis

Applied Probabilistic Algorithms for Big Data Analysis

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Advait Sarkar.

Introductory algorithms courses encourage us to think of computers as perfect machines that calculate exact answers. We typically design programs to provide exactly this type of perfection. However, it is possible to construct efficient algorithms by relaxing the zero error constraint. The demand for space and time resources can be drastically reduced in exchange of a small, quantifiable probability of error.

In this lecture, we will follow the journey of MildlyInappropriateCatAppreciationSociety.com and its competitors as they try to tackle some of the problems of managing large amounts of cat-related data. Motivated by examples and terrible cat puns, you will learn 5 probabilistic techniques that allow you do things such as:
  • efficiently test whether an item is already present in a gigantic distributed database
  • efficiently count the number of distinct items in said big database
  • efficiently tabulate the frequencies of different items in said big database

You will learn these techniques and their error bounds in sufficient detail that you will be able to implement them once the lecture is finished. They can all be implemented in a few dozen lines of code!

The theme of this lecture was inspired by this talk by Christian Steinruecken.

This talk is part of the Computer Laboratory Research Students' Lectures 2014 series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity