University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Large-scale Retrieval with Ivory and MapReduce

Large-scale Retrieval with Ivory and MapReduce

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

It is commonly acknowledged that web-scale collections have outgrown the capabilities of individual machines, necessitating the use of clusters to tackle many problems in information retrieval. The release of the 25-terabyte billion-page ClueWeb09 collection in 2009 and the increasing popularity of Hadoop, the open source implementation of the MapReduce distributed framework, have motivated academic researchers to think more seriously about cluster-based distributed retrieval solutions. In this talk, we will first introduce Ivory, an end-to-end open-source distributed retrieval system built at University of Maryland, College Park; Ivory takes full advantage of Hadoop and its underlying distributed file system for both indexing and retrieval. We will then present an overview of several research projects evolved around Ivory, such as approximate positional indexing for efficient ranked retrieval, scalable monolingual and cross-lingual pairwise document similarity, and automatically-extracted pseudo test collections for learning ranking functions for the task of web search.

This talk is part of the Microsoft Research Cambridge, public talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity