Evaluating and Regulating Foundation Models
- đ¤ Speaker: Miri Zilka, Neel Alex, Shoaib Ahmed Siddiqui, University of Cambridge
- đ Date & Time: Wednesday 21 May 2025, 11:00 - 12:30
- đ Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38.
Abstract
The emergence of foundation models and generalist AI systems has transformed the landscape of evaluation, introducing complex challenges that go far beyond the closed-domain settings of the past. This reading group aims to explore cutting-edge approaches for assessing these open-domain systems, with an emphasis on both technical evaluation strategies and evolving regulatory frameworks. We will begin by examining the unique difficulties of evaluating open-domain models, considering possible solutions and highlighting the risks of metric manipulation by resourceful actors. Next, we will discuss the methodologies employed by frontier labs for internal evaluation, as well as the interplay between technical validation and policy-driven oversight. Finally, we will explore evaluation in the context of human-machine collaboration, analyzing the challenges of measuring performance and alignment in systems with humans in the loop.
Series This talk is part of the Machine Learning Reading Group @ CUED series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38.
- Cambridge University Engineering Department Talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- Computational Continuum Mechanics Group Seminars
- custom
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Machine Learning Reading Group
- Machine Learning Reading Group @ CUED
- Machine Learning Summary
- ML
- ndk22's list
- ob366-ai4er
- Quantum Matter Journal Club
- Required lists for MLG
- rp587
- School of Technology
- Simon Baker's List
- TQS Journal Clubs
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Miri Zilka, Neel Alex, Shoaib Ahmed Siddiqui, University of Cambridge
Wednesday 21 May 2025, 11:00-12:30