BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Making the World's Scientific Information (More) Organized\, Acces
 sible\, and Usable - Ted Briscoe - University of Cambridge
DTSTART:20100212T120000Z
DTEND:20100212T130000Z
UID:TALK22725@talks.cam.ac.uk
CONTACT:Laura Rimell
DESCRIPTION:Web portals like Google Scholar and ScienceDirect have revolut
 ionized\naccess to scientific information by making it possible to identif
 y\nrelevant papers via keyword search\, and then to browse them on-line.\n
 However\, as scientific information continues to grow exponentially\, and\
 nas (e-)science embraces automation\, keeping abreast of and exploiting\nt
 he information in these papers effectively is becoming impossible.\n\nI'll
  describe a prototype scientific literature search and information\nextrac
 tion system\, developed in collaboration with the FlyBase (Fruit\nFly Geno
 mics) curation team\, designed to support very fine-grained but\nintuitive
  querying and access to information in a collection of papers.\nFlySearch 
 indexes annotated papers and supports integrated search over\nindividual s
 entences and images\, aggregating information across the\ncollection. For 
 example\, one can search captions describing a specific\ngene regulating a
  biological process and restrict the associated images\nto a specific body
  part.\n\nThe system rests on a processing pipeline in which a Portable Do
 cument\nFormat paper is first converted to Scientific eXtensible Mark-up\n
 Language\, preserving its logical structure but\, for example\, separating
 \nimages\, tables\, and references from running text\, and then applying\n
 specialized text and image processing tools to the different components\no
 f the paper. These are able to compute image similarity\, recognize gene\n
 names\, facts about genes\, and their relationships to other biological\ne
 ntities\, etc. They have been designed to be as generic as possible to\nfa
 cilitate application to different areas of science. Where they require\ndo
 main-specific tuning they have been developed using semi-supervised\nmachi
 ne learning methods to minimize such costs.\n\nInitial results suggest tha
 t many aspects of the user interface need\nrefinement but the underlying s
 earch functionality is able to improve\nspeed and precision significantly 
 over keyword-based document-level\nsearch. Nevertheless\, many further cha
 llenges remain\, of which perhaps\nthe most pressing is handling more form
 s of contextually-mediated\nvariant ways of expressing the same meaning\, 
 but we would also like to\nbe able to go beyond finding and extracting rel
 ations between biological\nentitites and\, for example\, support (e.g. tem
 poral) reasoning about\nbiological events.
LOCATION:SW01\, Computer Laboratory
END:VEVENT
END:VCALENDAR
