BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Model Evaluations under Uncertain Ground Truth - Taylan Cemgil (Go
 ogle DeepMind Technologies Limited)
DTSTART:20250813T105000Z
DTEND:20250813T112500Z
UID:TALK232615@talks.cam.ac.uk
DESCRIPTION:AI systems undergo thorough evaluations before deployment\, va
 lidating their predictions against a ground truth which is often assumed t
 o be fixed and certain. However\, in many domains\, such as medical applic
 ations\, the ground truth is often curated in the form of differential dia
 gnoses provided by multiple experts. While a single differential diagnosis
  reflects the uncertainty in one expert assessment\, multiple experts intr
 oduce another layer of uncertainty through &nbsp\;potential disagreement.&
 nbsp\;\n&nbsp\;\nIn this talk\, I&nbsp\;will argue that ignoring uncertain
 ty leads to overly optimistic estimates of model performance\, therefore u
 nderestimating risk associated with particular diagnostic decisions\,&nbsp
 \;leading to unanticipated&nbsp\;failure modes. We propose a statistical a
 ggregation approach\, where we infer a distribution on probabilities of un
 derlying medical condition candidates themselves\, based on observed annot
 ations. This formulation naturally accounts for the potential disagreement
 s between different experts\, as well as uncertainty stemming from individ
 ual differential diagnoses\, capturing the entire ground truth uncertainty
 . We conclude that\, while assuming a crisp ground truth can be acceptable
  for many AI applications\, a more nuanced evaluation protocol should be u
 tilized in medical diagnosis. If time permits\, I will also cover some wor
 k\, based on conformal methods that can provide statistical guarantees.\n&
 nbsp\;\nBased on joint work with David Stutz\, Melih Barsbey\, Alan Karthi
 kesalingam\, Arnaud Doucet and many others
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR