University of Cambridge > Talks.cam > Artificial Intelligence Research Group Talks (Computer Laboratory) > Wasserstein Natural Gradients for Reinforcement Learning

Wasserstein Natural Gradients for Reinforcement Learning

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mateja Jamnik.

Join us on Zoom

Policy Gradient methods can learn complex behaviours in difficult reinforcement learning tasks but often struggle with data-inefficiency: they make slow progress requiring frequent rollouts or simulations of the environment. A key to speeding these methods up is to incorporate the information geometry of policies into the optimisation. This can be done via trust regions (TRPO), additive penalties (PPO), or via natural gradients.

In this talk I present new optimization approach which can be applied to policy optimisation as well as evolution strategies for reinforcement learning. The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. I will illustrate the differences between different natual gradient descent schemes and discuss experiments on challenging tasks which demonstrate improvements in both computational cost and performance over advanced baselines.

This talk is largely based on https://arxiv.org/abs/2010.05380

This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity