University of Cambridge > Talks.cam > Artificial Intelligence Research Group Talks (Computer Laboratory) > Mechanistic Interpretability - Progress and Limits

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Mechanistic Interpretability - Progress and Limits

Download to your calendar using vCal

Arthur Conmy (Google DeepMind)
Tuesday 03 March 2026, 16:00-17:00
Lecture Theatre 2, Computer Laboratory, William Gates Building.

If you have a question about this talk, please contact Mateja Jamnik .

Notice unusual time

In the broadest sense, mechanistic interpretability refers to explaining neural network behavior in terms of their internal components. We cover early work on vision models, transformer circuits, and automated circuit discovery. We then turn to superposition (what it means mathematically and why we think it occurs in modern transformer language models), the linear representation hypothesis, and sparse autoencoders. Finally, we discuss recent applications in deployed AI systems, and offer a balanced perspective on when mechanistic interpretability is the right tool and when other approaches may be more appropriate as future AI systems get more capable.

Bio: Arthur Conmy is a Senior Research Engineer at Google DeepMind. He produced foundational mechanistic interpretability research, including Interpretability in the Wild (ICLR) and ACDC : Automated Circuit Discovery (NeurIPS 2023), and recently added activation probes to live Gemini deployments to detect misuse.

This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Mechanistic Interpretability - Progress and Limits

📅 Download to calendar (vCal)

⚠️ Important: Notice unusual time

👤 Speaker: Arthur Conmy (Google DeepMind)
📅 Date & Time: Tuesday 03 March 2026, 16:00 - 17:00
📍 Venue: Lecture Theatre 2, Computer Laboratory, William Gates Building

Questions? Contact Mateja Jamnik

Abstract

Series This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Mechanistic Interpretability - Progress and Limits

This talk is included in these lists:

Mechanistic Interpretability - Progress and Limits

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Mechanistic Interpretability - Progress and Limits

This talk is included in these lists:

Other lists

Other talks

Mechanistic Interpretability - Progress and Limits

Abstract

Included in Lists