Efficient exploration in linear Markov decision processes - (in Moller2)
- đ¤ Speaker: Antoine Moulin (Universitat Pompeu Fabra)
- đ Date & Time: Monday 24 November 2025, 15:00 - 16:00
- đ Venue: External
Abstract
We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our main idea is to combine two classic techniques for optimistic exploration: additive exploration bonuses applied to the reward function, and artificial transitions made to an absorbing state with maximal return. We show that, combined with a regularized approximate dynamic-programming scheme, the resulting algorithm achieves a regret of order \tilde{\mathcal{O}} (\sqrt{d3 (1 – \gamma){- 7 / 2} T}), where T is the total number of sample transitions, \gamma \in (0,1) is the discount factor, and d is the feature dimensionality. The results continue to hold against adversarial reward sequences, enabling application of our method to the problem of imitation learning in linear MDPs, where we achieve state-of-the-art results.
Series This talk is part of the Isaac Newton Institute Seminar Series series.
Included in Lists
- All CMS events
- bld31
- dh539
- External
- Featured lists
- INI info aggregator
- Isaac Newton Institute Seminar Series
- School of Physical Sciences
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Antoine Moulin (Universitat Pompeu Fabra)
Monday 24 November 2025, 15:00-16:00