Use of Linguistic Information and Reordering Strategies for Ngram- based Statistical Machine Translation
- đ¤ Speaker: Adria de Gispert, TALP Research Centre â Univ. Politecnica de Catalunya (UPC), Barcelona, Spain
- đ Date & Time: Tuesday 31 October 2006, 13:00 - 14:00
- đ Venue: LR5, Engineering Department, Baker Building
Abstract
This seminar will be devoted to an overview of the experience in statistical machine translation at UPC during recent years. Firstly, the Ngram-based SMT system will be described, detailing bilingual unit definition and basic feature functions for a monotone language pair. Secondly, the introduction of linguistic information at various stages will be discussed, including word alignment (investigating correlation between Alignment Error Rate and translation scores), bilingual unit segmentation and direct translation modelling. Results on English-to-Spanish verb form classification will be reviewed, as well as the impact of morphology reduction on bilingual N-gram formulation. For language pairs exhibiting less monotone word order, the reordering strategies implemented will be presented. Particularly, reordered search involving tuple unfolding and extended monotone search by linguistically-driven reordering rules will be compared for Arabic, Chinese and Spanish-to-English tasks. Finally, the seminar will conclude outlining general future research directions towards improving performance of current state-of-the-art SMT systems.
Series This talk is part of the Machine Intelligence Laboratory Speech Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CUED Speech Group Seminars
- Guy Emerson's list
- Information Engineering Division seminar list
- LR5, Engineering Department, Baker Building
- Machine Intelligence Laboratory Speech Seminars
- PhD related
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Adria de Gispert, TALP Research Centre â Univ. Politecnica de Catalunya (UPC),
Barcelona, Spain
Tuesday 31 October 2006, 13:00-14:00