University of Cambridge > Talks.cam > NLIP Seminar Series > Multilingual Models for Distributed Semantics

Multilingual Models for Distributed Semantics

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Tamara Polajnar.

In this talk I will present a technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. These models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences, using a form of noise-contrastive update.

A nice feature of these models is that they do not rely on word alignments or any syntactic information, making them easy to apply to a large number of diverse languages. I will briefly also describe an extension of this approach to learn semantic representations at the document level.

The talk will conclude with an analysis of these models and some empirical evaluation. Using several cross-lingual document classification tasks, I show that this approach can be used to learn semantically plausible, multilingual distributed representations.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity