University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

Download to your calendar using vCal

Michal Valko (INRIA Lille - Nord Europe Research Centre)
Tuesday 11 November 2025, 14:00-14:40
Seminar Room 1, Newton Institute.

If you have a question about this talk, please contact nobody.

SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

Traditional Reinforcement Learning from Human Feedback typically relies on reward models and preference structures such as the Bradley–Terry model. While effective in some cases, these assumptions fail to capture the richness of human preferences, which often exhibit phenomena such as intransitivity. In this talk, we present Nash Learning from Human Feedback, a more direct alternative that frames the problem as finding a Nash equilibrium in a game induced by human preferences. This perspective provides a principled way to model complex, potentially non-transitive preferences without the need to introduce a reward model. We will survey methods for approximating Nash equilibria in this setting, with a focus on fine-tuning large language models. In particular, we show how (approximate) proximal optimization methods—notably the NashMD and then Mirror Prox algorithm—can be adapted to achieve fast and stable convergence in this setting. Finally, we discuss practical strategies for efficiently implementing these approximate proximal methods in large-scale training.

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

📅 Download to calendar (vCal)

⚠️ Important: SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

👤 Speaker: Michal Valko (INRIA Lille - Nord Europe Research Centre)
📅 Date & Time: Tuesday 11 November 2025, 14:00 - 14:40
📍 Venue: Seminar Room 1, Newton Institute

Questions? Contact the organiser

Abstract

Series This talk is part of the Isaac Newton Institute Seminar Series series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

This talk is included in these lists:

Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

This talk is included in these lists:

Other lists

Other talks

Nash and Nemirovski walk into a bar: LLM alligment with Mirror Descent and Proximal Methods

Abstract

Included in Lists