Towards automated understanding of scientific papers
- đ¤ Speaker: Maria Liakata
- đ Date & Time: Thursday 21 May 2009, 12:00 - 13:00
- đ Venue: SW01, Computer Laboratory
Abstract
The large number of scientific papers generated, especially in the life sciences, makes it a challenge for researchers and resource curators to extract and evaluate the knowledge contained within them. Automated text mining methods currently operate mainly on abstracts but scientists have highlighted the need for the automatic processing of the full text. Researchers in information extraction and information retrieval have to be able to recognise areas of interest in papers and scientists have expressed the need for machine readable summaries. However, the manual production of semantic markup in papers is very time consuming and cannot cater for the millions of papers already published. We have produced a tool (SAPIENT) and an ontology-based annotation scheme for the annotation of core scientific concepts (CISP) (Goal’, Motivation’,Object’,Hypothesis’,Background’,Model’,Experiment’,Method’,Observation’,Result’,`Conclusion’) in research papers. A corpus of 225 papers covering topics in physical chemistry and biochemistry were annotated at the sentence level by 16 experts using SAPIENT and the CISP -based annotation scheme. Within the SAPIENTA project we plan to use this corpus to enable the automatic recognition of scientific concepts in papers and generate digital abstracts in both human and machine readable format. We also aim to enable intelligent querying of the content of scientific papers by exploiting the extra semantic information and representing the relevant sections in a first order logic form that reasoners can handle.
Bio: Dr Maria Liakata has an Oxford DPhil in Computational Linguistics, on the topic of using Inductive Logic Programming to learn pragmatic knowledge from a corpus (Inducing Domain Theories). Since June 2005 she has been a research associate with the Computational Biology group at Aberystwyth University and has worked on interdisciplinary projects, such as the Robot Scientist, involving the automation and formalisation of science.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- SW01, Computer Laboratory
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Maria Liakata
Thursday 21 May 2009, 12:00-13:00