BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Mining scientific diagrams for semantic information - Dr. Peter Mu
 rray-Rust (University of Cambridge)
DTSTART:20161123T140000Z
DTEND:20161123T150000Z
UID:TALK68310@talks.cam.ac.uk
CONTACT:Emily Boyd
DESCRIPTION:Scientific data is often only reported as diagrams in publicat
 ions and is effectively destroyed and lost. This data is often critically 
 valuable for other scientists and data abstracting services\, and often ha
 s to be recreated manually from the diagram at great expense\, with waste 
 and error. Examples include plots\, charts\, and more complex objects such
  as chemical structure diagrams and phylogenetic (evolutionary) trees.\n\n
 I shall show how\, in favourable circumstances\, it is possible to recreat
 e semantic information from diagrams using well-established Computer Visio
 n techniques. These include thresholding\, binarization\, dilation and thi
 nning\, OCR and a variety of domain-specific heuristics. Our Open Source l
 ibrary is based on BoofCV \, an Open Java Image processing library\, and e
 nhanced with tools useful for scientific documents. Some PDF documents con
 tain vector images and are particularly tractable while others are only pi
 xel images and suffer form overlap\, problems of scale and loss of detail\
 n\nI shall show the application to chemistry and phylogenetics and show wh
 ere errors and loss occur.\n\nSee also my slides from last year at:\n\nhtt
 p://www.slideshare.net/petermurrayrust/mining-scientific-diagrams-for-fact
 s
LOCATION:MR4\, Centre for Mathematical Sciences\, Wilberforce Road\, Cambr
 idge
END:VEVENT
END:VCALENDAR
