BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Evaluating and Regulating Foundation Models - Miri Zilka\, Neel Al
 ex\, Shoaib Ahmed Siddiqui\, University of Cambridge
DTSTART:20250521T100000Z
DTEND:20250521T113000Z
UID:TALK232570@talks.cam.ac.uk
CONTACT:120952
DESCRIPTION:The emergence of foundation models and generalist AI systems h
 as transformed the landscape of evaluation\, introducing complex challenge
 s that go far beyond the closed-domain settings of the past. This reading 
 group aims to explore cutting-edge approaches for assessing these open-dom
 ain systems\, with an emphasis on both technical evaluation strategies and
  evolving regulatory frameworks. We will begin by examining the unique dif
 ficulties of evaluating open-domain models\, considering possible solution
 s and highlighting the risks of metric manipulation by resourceful actors.
  Next\, we will discuss the methodologies employed by frontier labs for in
 ternal evaluation\, as well as the interplay between technical validation 
 and policy-driven oversight. Finally\, we will explore evaluation in the c
 ontext of human-machine collaboration\, analyzing the challenges of measur
 ing performance and alignment in systems with humans in the loop.
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR
