University of Cambridge > Talks.cam > NLIP Seminar Series > Representation Learning for Text Retrieval: Learning and Pretraining Strategies for Dense Retrieval

Representation Learning for Text Retrieval: Learning and Pretraining Strategies for Dense Retrieval

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact James Thorne.

Unusual date and time

Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/95119479973?pwd=RGFYZndIVVhDWEtySy8wV3VTZlpnZz09

Meeting ID: 951 1947 9973 Passcode: 602575

Text retrieval is one of the most predominate tasks for language techniques. It is an end application itself, powering search engines for billions of users. It can also serve as a first stage retrieval component for other language systems: Question Answering, Information extraction, etc. Text retrieval has been done by matching queries and documents in the sparse, bag-of-words space, e.g., using BM25 , since the 1970s. We joked that every year we saw techniques that improved BM25 by 10%, but decades later we are still working on 10% improvement over BM25 in our research. Dense retrieval provides a unique opportunity to overcome the limitations of bag-of-word based sparse retrieval. With pretrained language models, we now can encode the query and documents into one embedding space and conduct reasonable first stage retrieval purely using embedding similarities. In this talk, I will first recap recent progress in dense retrieval, then I will present our incoming ICLR 2021 paper (ANCE) on better training dense retrieval with approximate nearest neighbor contrastive learning. The obstacles in dense retrieval training led to us questioning the alignment of pretrained language models and the needs of dense retrieval. In the last part of this talk I will present our on-going work (Seed-Encoder) in designing pretraining strategies dedicated to dense retrieval.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2021 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity