BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Convergence of the actor-critic gradient flow for entropy regulari
 sed MDPs in general action spaces - David Siska (University of Edinburgh)
DTSTART:20251113T163000Z
DTEND:20251113T171000Z
UID:TALK238516@talks.cam.ac.uk
DESCRIPTION:We prove the stability and global convergence of a coupled act
 or-critic gradient flow for infinite-horizon and entropy-regularised Marko
 v decision processes (MDPs) in continuous state and action space with line
 ar function approximation under Q-function realisability. We consider a ve
 rsion of the actor critic gradient flow where the critic is updated using 
 temporal difference (TD) learning while the policy is updated using a poli
 cy mirror descent method on a separate timescale. We demonstrate stability
  and exponential convergence of the actor critic flow to the optimal polic
 y. Finally\, we address the interplay of the timescale separation and entr
 opy regularisation and its effect on stability and convergence.&nbsp\;\nTh
 is is joint work with Denis Zorba and Lukasz Szpruch.
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
