Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Variance in Policy Gradient methods and Learning Sequential Latent Variable Models

Add to your list(s) Download to your calendar using vCal

George Tucker, Google Brain
Wednesday 04 July 2018, 11:00-12:00
Engineering Department, CBL Room BE-438..

If you have a question about this talk, please contact .

I will discuss two efforts to improve learning in RL. In the first part, I’ll talk about our work towards understanding variance in policy gradient estimators. PPO and TRPO provide strong performance at the cost of requiring many on-policy samples, which makes them challenging to use in real-world applications. The high sample requirement arises from high variance gradient estimates. We explore where this variance comes from, and how we can reduce it.

Switching gears, in the second part, I’ll talk about learning models of the world, which can simplify control by lifting the problem to a lower dimensional embedding space. Three groups independently introduced the idea of using a particle filter to train highly flexible non-linear sequential latent variable models. A key deficiency with this work is that the training procedure cannot properly account for temporal dependencies in the data because it uses the filtering distributions. We introduce learned tilting functions, which allow us to control the target distributions sequential Monte Carlos passes through. In principle, we can train everything jointly with a coherent objective. I’ll discuss preliminary results and challenges that we have yet to resolve.

Bio: George Tucker is a researcher on the Google Brain team focusing on reinforcement learning and sequence models. He received his PhD from MIT in Mathematics and previously worked as researcher at Amazon in the speech group.

This talk is part of the Machine Learning @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Variance in Policy Gradient methods and Learning Sequential Latent Variable Models

This talk is included in these lists:

Other lists

Other talks