University of Cambridge > Talks.cam > Language Technology Lab Seminars > Cross-Lingual Word Embeddings in 60 Minutes

Cross-Lingual Word Embeddings in 60 Minutes

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Edoardo Maria Ponti.

Abstract: In recent past, NLP as a field has seen tremendous utility of word embeddings as features in downstream tasks. The fact that these word vectors can be trained on unlabeled monolingual corpora of a language makes them an inexpensive resource in NLP . With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually. Therefore, learning bilingual and multilingual word embeddings is currently an important research topic. These vectors offer an elegant and language-pair independent way to represent content across different languages in shared cross-lingual embedding spaces, and also enable the integration of knowledge from external resources (e.g., WordNet, dictionaries) into the embedding spaces. In this mini-tutorial, I will briefly discuss the current techniques in cross-lingual word embedding learning, presenting the model typology based on multilingual training data requirements, also including very recent zero-supervision methods that require no bilingual data at all. I will then introduce several illustrative applications of the induced embedding spaces, including bilingual dictionary induction, ad-hoc cross-lingual information retrieval, and cross-lingual transfer for dependency parsing and dialogue state tracking.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity