One format to rule them all? How to generate high quality data for research and industry: notes from the National Physical Laboratory.
- 👤 Speaker: Marina Romancikova - NPL
- 📅 Date & Time: Thursday 08 March 2018, 13:00 - 14:30
- 📍 Venue: Kavli Large Meeting Room, Kavli Building
Abstract
Have you ever found your own project file and wondered what these data are and how you generated them? You are not alone. Scientists store the research results in a multitude of formats and locations. The terminologies that describe the data are poorly defined and may vary even within a single phrase; more variations are found across individuals, teams, departments and organisations. The data documentation typically requires a considerable amount of human input and is performed on an ad hoc basis. None of it is made easier by the plethora of ever-changing proprietary formats used by scientific equipment vendors. The result are countless working hours spent on “data archaeology” and generation of “data cemeteries” rather than “data lakes”.
In the world of medical imaging, these issues have been alleviated by the use of a single data exchange standard between different devices. In other research domains such remedy is still to be found. National Physical Laboratory (UK), National Institute of Standards and Technology (USA), AstraZeneca and GlaxoSmithKline have joined the efforts to make scientific data available, discoverable and understandable. The Cancer Research UK Grand Challenge initiative “Google Earth of Cancer” provides an ideal platform for this undertaking. The initiative encompasses all existing instruments for a novel cancer imaging technology called Mass Spectrometry Imaging (MSI). Researchers from the partner organisations will define a “minimum metadata standard” for MSI , and the equipment vendors will be actively encouraged to implement it, leaving scientists to do the science. MSI data will be stored in Object Stores as self-describing data objects that can be exchanged between organisations and tagged with features of interest using machine and deep learning. We will extend the approach to include other scientific data types as the project progresses.
Series This talk is part of the Data Intensive Science Seminar Series series.
Included in Lists
- bld31
- Cambridge Astronomy Talks
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Combined External Astrophysics Talks DAMTP
- Cosmology, Astrophysics and General Relativity
- Institute of Astronomy Extra Talks
- Institute of Astronomy Talk Lists
- Interested Talks
- Kavli Large Meeting Room, Kavli Building
- ndk22's list
- ob366-ai4er
- rp587
- Titel: TBC
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Marina Romancikova - NPL
Thursday 08 March 2018, 13:00-14:30