BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Settling the sample complexity of online reinforcement learning - 
 Yuxin Chen (University of Pennsylvania)
DTSTART:20251111T094000Z
DTEND:20251111T102000Z
UID:TALK238456@talks.cam.ac.uk
DESCRIPTION:A central issue lying at the heart of online reinforcement lea
 rning (RL) is data efficiency. While a number of recent works achieved asy
 mptotically minimal regret in online RL\, the optimality of these results 
 is only guaranteed in a &ldquo\;large-sample&rdquo\; regime\, imposing eno
 rmous burn-in cost in order for their algorithms to operate optimally. How
  to achieve minimax-optimal regret without incurring any burn-in cost has 
 been an open problem in RL theory. We settle this problem for the context 
 of finite-horizon inhomogeneous Markov decision processes. Specifically\, 
 we prove that a modified version of Monotonic Value Propagation (MVP) achi
 eves a regret that matches the minimax lower bound for the entire range of
  sample size\, essentially eliminating any burn-in requirement. The key te
 chnical innovation lies in the development of a new regret decomposition s
 trategy and a novel analysis paradigm to decouple complicated statistical 
 dependency &ndash\; a long-standing challenge facing the analysis of onlin
 e RL in the sample-hungry regime. This is joint work with Zihan Zhang\, Ja
 son Lee and Simon Du.
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
