BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:AI Control - Dmitrii Krasheninnikov\, Lauro Langosco\, University 
 of Cambridge
DTSTART:20240417T100000Z
DTEND:20240417T113000Z
UID:TALK215860@talks.cam.ac.uk
CONTACT:120952
DESCRIPTION:How do we avoid harms from future AI systems that are deployed
  autonomously and are potentially more capable than humans? Most efforts s
 o far have focused on alignment: getting the AI to want what we want. In t
 his presentation\, we focus on an orthogonal approach: developing protocol
 s to deploy AIs safely even if they are misaligned. This approach is calle
 d AI control.\nThe two main ingredients in AI control are\n\nControl proto
 cols – AI deployment protocols designed to be robust to models acting ag
 ainst us (e.g. monitoring the AI’s outputs with a less capable AI system
 )\n\nMethods to evaluate control protocols\n\nIn our presentation we will 
 1) present a paper [1] on using AI control in a setting where a smart but 
 misaligned LLM sometimes tries to insert backdoors into code\, and 2) disc
 uss the broader control agenda and the associated challenges.\n[1] Greenbl
 att\, R.\, Shlegeris\, B.\, Sachan\, K.\, & Roger\, F. (2023). AI Control:
  Improving safety despite intentional subversion. https://arxiv.org/abs/23
 12.06942
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR
