Out-of-context reasoning/learning in LLMs and its safety implications
- đ¤ Speaker: Dmitrii Krasheninnikov, Usman Anwar, University of Cambridge
- đ Date & Time: Wednesday 02 April 2025, 11:00 - 12:30
- đ Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38.
Abstract
Beyond learning patterns within individual training datapoints, Large Language Models (LLMs) can infer latent structures and relationships by aggregating information scattered across different training samples through out-of-context reasoning (OOCR) [1, 2]. We’ll review key empirical findings, including Implicit Meta-Learning (models learning source reliability implicitly and subsequently internalizing reliable-seeming data more strongly [1]) and Inductive OOCR (models inferring other latent structures from scattered data [3]). We’ll explore potential mechanisms behind these phenomena [1, 4]. Finally, we’ll discuss the significant AI safety implications, arguing that OOCR coupled with Situational Awareness [5] underpins threats like Alignment Faking [6], potentially leading to persistent misalignment resistant to standard alignment techniques.
1. Krasheninnikov et al., “Implicit meta-learning may lead language models to trust more reliable sources” https://arxiv.org/abs/2310.15047 2. Berglund et al., “Taken Out of Context: On Measuring Out-of-Context Reasoning in LLMs” https://arxiv.org/abs/2309.00667 3. Treutlein et al., “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data” https://arxiv.org/abs/2406.14546 4. Feng et al., “Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts” https://arxiv.org/abs/2412.04614 5. Laine et al., “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” https://arxiv.org/abs/2407.04694 6. Greenblatt et al., “Alignment faking in large language models” https://arxiv.org/abs/2412.14093
Series This talk is part of the Machine Learning Reading Group @ CUED series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38.
- Cambridge University Engineering Department Talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- Computational Continuum Mechanics Group Seminars
- custom
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Machine Learning Reading Group
- Machine Learning Reading Group @ CUED
- Machine Learning Summary
- ML
- ndk22's list
- ob366-ai4er
- Quantum Matter Journal Club
- Required lists for MLG
- rp587
- School of Technology
- Simon Baker's List
- TQS Journal Clubs
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Dmitrii Krasheninnikov, Usman Anwar, University of Cambridge
Wednesday 02 April 2025, 11:00-12:30