University of Cambridge > Talks.cam > Machine Learning @ CUED > Variance in Policy Gradient methods and Learning Sequential Latent Variable Models

Variance in Policy Gradient methods and Learning Sequential Latent Variable Models

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact .

I will discuss two efforts to improve learning in RL. In the first part, I’ll talk about our work towards understanding variance in policy gradient estimators. PPO and TRPO provide strong performance at the cost of requiring many on-policy samples, which makes them challenging to use in real-world applications. The high sample requirement arises from high variance gradient estimates. We explore where this variance comes from, and how we can reduce it.

Switching gears, in the second part, I’ll talk about learning models of the world, which can simplify control by lifting the problem to a lower dimensional embedding space. Three groups independently introduced the idea of using a particle filter to train highly flexible non-linear sequential latent variable models. A key deficiency with this work is that the training procedure cannot properly account for temporal dependencies in the data because it uses the filtering distributions. We introduce learned tilting functions, which allow us to control the target distributions sequential Monte Carlos passes through. In principle, we can train everything jointly with a coherent objective. I’ll discuss preliminary results and challenges that we have yet to resolve.

Bio: George Tucker is a researcher on the Google Brain team focusing on reinforcement learning and sequence models. He received his PhD from MIT in Mathematics and previously worked as researcher at Amazon in the speech group.

This talk is part of the Machine Learning @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity