BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Efficient exploration in linear Markov decision processes - (in Mo
 ller2) - Antoine Moulin (Universitat Pompeu Fabra)
DTSTART:20251124T150000Z
DTEND:20251124T160000Z
UID:TALK241303@talks.cam.ac.uk
DESCRIPTION:We study the problem of reinforcement learning in infinite-hor
 izon discounted linear Markov decision processes (MDPs)\, and propose the 
 first computationally efficient algorithm achieving rate-optimal regret gu
 arantees in this setting. Our main idea is to combine two classic techniqu
 es for optimistic exploration: additive exploration bonuses applied to the
  reward function\, and artificial transitions made to an absorbing state w
 ith maximal return. We show that\, combined with a regularized approximate
  dynamic-programming scheme\, the resulting algorithm achieves a regret of
  order \\tilde{\\mathcal{O}} (\\sqrt{d^3 (1 - \\gamma)^{- 7 / 2} T})\, whe
 re T is the total number of sample transitions\, \\gamma \\in (0\,1) is th
 e discount factor\, and d is the feature dimensionality. The results conti
 nue to hold against adversarial reward sequences\, enabling application of
  our method to the problem of imitation learning in linear MDPs\, where we
  achieve state-of-the-art results.
LOCATION:External
END:VEVENT
END:VCALENDAR