University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Settling the sample complexity of online reinforcement learning

Settling the sample complexity of online reinforcement learning

Download to your calendar using vCal

If you have a question about this talk, please contact nobody.

SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for the context of finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version of Monotonic Value Propagation (MVP) achieves a regret that matches the minimax lower bound for the entire range of sample size, essentially eliminating any burn-in requirement. The key technical innovation lies in the development of a new regret decomposition strategy and a novel analysis paradigm to decouple complicated statistical dependency – a long-standing challenge facing the analysis of online RL in the sample-hungry regime. This is joint work with Zihan Zhang, Jason Lee and Simon Du.

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Š 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity