University of Cambridge > Talks.cam > Data Intensive Science Seminar Series > One format to rule them all? How to generate high quality data for research and industry: notes from the National Physical Laboratory.

One format to rule them all? How to generate high quality data for research and industry: notes from the National Physical Laboratory.

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact James Fergusson.

Have you ever found your own project file and wondered what these data are and how you generated them? You are not alone. Scientists store the research results in a multitude of formats and locations. The terminologies that describe the data are poorly defined and may vary even within a single phrase; more variations are found across individuals, teams, departments and organisations. The data documentation typically requires a considerable amount of human input and is performed on an ad hoc basis. None of it is made easier by the plethora of ever-changing proprietary formats used by scientific equipment vendors. The result are countless working hours spent on “data archaeology” and generation of “data cemeteries” rather than “data lakes”.

In the world of medical imaging, these issues have been alleviated by the use of a single data exchange standard between different devices. In other research domains such remedy is still to be found. National Physical Laboratory (UK), National Institute of Standards and Technology (USA), AstraZeneca and GlaxoSmithKline have joined the efforts to make scientific data available, discoverable and understandable. The Cancer Research UK Grand Challenge initiative “Google Earth of Cancer” provides an ideal platform for this undertaking. The initiative encompasses all existing instruments for a novel cancer imaging technology called Mass Spectrometry Imaging (MSI). Researchers from the partner organisations will define a “minimum metadata standard” for MSI , and the equipment vendors will be actively encouraged to implement it, leaving scientists to do the science. MSI data will be stored in Object Stores as self-describing data objects that can be exchanged between organisations and tagged with features of interest using machine and deep learning. We will extend the approach to include other scientific data types as the project progresses.

This talk is part of the Data Intensive Science Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2020 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity