BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Static Analysis for Data Scientists - Caterina Urban (INRIA)
DTSTART:20220708T083000Z
DTEND:20220708T093000Z
UID:TALK175796@talks.cam.ac.uk
DESCRIPTION:Big data analytics has revolutionized the world of software de
 velopment in the past decade. Every day\, data scientists write computer p
 rograms to clean\, manipulate\, and visualize data\, in order to help us m
 ake data-driven decisions. As we rely more and more on data analytics soft
 ware\, we become increasingly vulnerable to programming or technical mista
 kes. Mistakes that do not cause software failures can have serious consequ
 ences\, since they give no indication that something went wrong. A simple 
 technical mistake made during data processing caused nearly 16\,000 cases 
 of Covid-19 between September 25th and October 2nd\, 2020 to go unreported
  from official figures in the UK. As a consequence\, Public Health England
  was unable to send out the relevant contact-tracing alerts. Mistakes in s
 afety-critical applications can be deadly.In this talk\, I will present on
 going work to develop an abstract interpretation-based static analysis fra
 mework for data scientists. In particular\, I will focus on an analysis th
 at infers necessary conditions on the structure and values of the data rea
 d by a data analytics program. The analysis builds on a family of underlyi
 ng abstract domains\, extended to indirectly reason about the input data r
 ather than simply reasoning about the program variables. The choice of the
 se abstract domains is a parameter of the analysis. We describe various in
 stances built from existing abstract domains. We then demonstrate the pote
 ntial of the approach on a number of representative examples and discuss o
 ngoing efforts to target data analytics using Jupyter notebooks.
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
