BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:The Web as an Implicit Training Set: Application to Noun Compounds
 ' Syntax and Semantics - Preslav Nakov - National University of Singapore
DTSTART:20100129T120000Z
DTEND:20100129T130000Z
UID:TALK22723@talks.cam.ac.uk
CONTACT:Laura Rimell
DESCRIPTION:I will present Web-based approaches to \nthe syntax and semant
 ics of noun compounds (NCs)\,\nwhich can be used in query parsing\, techni
 cal term understanding\, etc.\nI will also describe an application to mach
 ine translation.\n\nFirst\, I will present a highly accurate lightly super
 vised method \nbased on surface features and paraphrases for\nmaking brack
 eting decisions for three-word noun compounds\,\ne.g. "[[liver cell] antib
 ody]" is left-bracketed\, \nwhile "[liver [cell line]]" is right-bracketed
 .\nThe enormous size of the Web makes such features \nfrequent enough to b
 e useful.\n\nSecond\, I will introduce an unsupervised method \nfor discov
 ering the implicit predicates characterizing \nthe semantic relations that
  hold in noun-noun compounds. \nFor example\, "malaria mosquito" is a \n"m
 osquito that carries/spreads/causes/transmits/brings/infects with/... mala
 ria".\n\nFinally\, I will present a method for improving Machine Translati
 on (SMT).\nMost modern SMT systems rely on aligned sentences of bilingual 
 corpora \nfor training. I will describe a method for expanding the trainin
 g set \nwith conceptually similar but syntactically differing paraphrases 
 \nat the NP-level which involve NCs. The English to Spanish evaluation \no
 n the Europarl corpus shows an improvement equivalent to 33%-50% \nof that
  of doubling the amount of training data.
LOCATION:SW01\, Computer Laboratory
END:VEVENT
END:VCALENDAR