University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Out-of-context reasoning/learning in LLMs and its safety implications

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Out-of-context reasoning/learning in LLMs and its safety implications

Download to your calendar using vCal

Dmitrii Krasheninnikov, Usman Anwar, University of Cambridge
Wednesday 02 April 2025, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38..

If you have a question about this talk, please contact .

Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.

Beyond learning patterns within individual training datapoints, Large Language Models (LLMs) can infer latent structures and relationships by aggregating information scattered across different training samples through out-of-context reasoning (OOCR) [1, 2]. We’ll review key empirical findings, including Implicit Meta-Learning (models learning source reliability implicitly and subsequently internalizing reliable-seeming data more strongly [1]) and Inductive OOCR (models inferring other latent structures from scattered data [3]). We’ll explore potential mechanisms behind these phenomena [1, 4]. Finally, we’ll discuss the significant AI safety implications, arguing that OOCR coupled with Situational Awareness [5] underpins threats like Alignment Faking [6], potentially leading to persistent misalignment resistant to standard alignment techniques.

1. Krasheninnikov et al., “Implicit meta-learning may lead language models to trust more reliable sources” https://arxiv.org/abs/2310.15047 2. Berglund et al., “Taken Out of Context: On Measuring Out-of-Context Reasoning in LLMs” https://arxiv.org/abs/2309.00667 3. Treutlein et al., “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data” https://arxiv.org/abs/2406.14546 4. Feng et al., “Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts” https://arxiv.org/abs/2412.04614 5. Laine et al., “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” https://arxiv.org/abs/2407.04694 6. Greenblatt et al., “Alignment faking in large language models” https://arxiv.org/abs/2412.14093

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Out-of-context reasoning/learning in LLMs and its safety implications

📅 Download to calendar (vCal)

⚠️ Important: Teams link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders via lists.cam.ac.uk.

👤 Speaker: Dmitrii Krasheninnikov, Usman Anwar, University of Cambridge
📅 Date & Time: Wednesday 02 April 2025, 11:00 - 12:30
📍 Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38.

Questions? Contact

Abstract

Series This talk is part of the Machine Learning Reading Group @ CUED series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Out-of-context reasoning/learning in LLMs and its safety implications

This talk is included in these lists:

Out-of-context reasoning/learning in LLMs and its safety implications

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Out-of-context reasoning/learning in LLMs and its safety implications

This talk is included in these lists:

Other lists

Other talks

Out-of-context reasoning/learning in LLMs and its safety implications

Abstract

Included in Lists