University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Efficient exploration in linear Markov decision processes - (in Moller2)

Efficient exploration in linear Markov decision processes - (in Moller2)

Download to your calendar using vCal

  • UserAntoine Moulin (Universitat Pompeu Fabra)
  • ClockMonday 24 November 2025, 15:00-16:00
  • HouseExternal.

If you have a question about this talk, please contact nobody.

SCL - Bridging Stochastic Control And Reinforcement Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our main idea is to combine two classic techniques for optimistic exploration: additive exploration bonuses applied to the reward function, and artificial transitions made to an absorbing state with maximal return. We show that, combined with a regularized approximate dynamic-programming scheme, the resulting algorithm achieves a regret of order \tilde{\mathcal{O}} (\sqrt{d3 (1 – \gamma){- 7 / 2} T}), where T is the total number of sample transitions, \gamma \in (0,1) is the discount factor, and d is the feature dimensionality. The results continue to hold against adversarial reward sequences, enabling application of our method to the problem of imitation learning in linear MDPs, where we achieve state-of-the-art results.

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Š 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity