Settling the sample complexity of online reinforcement learning
- đ¤ Speaker: Yuxin Chen (University of Pennsylvania)
- đ Date & Time: Tuesday 11 November 2025, 09:40 - 10:20
- đ Venue: Seminar Room 1, Newton Institute
Abstract
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for the context of finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version of Monotonic Value Propagation (MVP) achieves a regret that matches the minimax lower bound for the entire range of sample size, essentially eliminating any burn-in requirement. The key technical innovation lies in the development of a new regret decomposition strategy and a novel analysis paradigm to decouple complicated statistical dependency – a long-standing challenge facing the analysis of online RL in the sample-hungry regime. This is joint work with Zihan Zhang, Jason Lee and Simon Du.
Series This talk is part of the Isaac Newton Institute Seminar Series series.
Included in Lists
- All CMS events
- bld31
- dh539
- Featured lists
- INI info aggregator
- Isaac Newton Institute Seminar Series
- School of Physical Sciences
- Seminar Room 1, Newton Institute
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Yuxin Chen (University of Pennsylvania)
Tuesday 11 November 2025, 09:40-10:20