University of Cambridge > > CUED Control Group Seminars > Q-learning and Pontryagin's Minimum Principle

Q-learning and Pontryagin's Minimum Principle

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Ioannis Lestas.

Q-learning and Pontryagin’s Minimum Principle

Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It has proven to be effective for models with finite state and action space. This paper establishes connections between Q-learning and nonlinear control of continuous-time models with general state space and general action space. The main contributions are summarized as follows.

  • The starting point is the observation that the “Q-function” appearing in Q-learning algorithms is an extension of the Hamiltonian that appears in the Minimum Principle. Based on this observation we introduce the steepest descent Q-learning (SDQ-learning) algorithm to obtain the optimal approximation of the Hamiltonian within a prescribed finite-dimensional function class.
  • A transformation of the optimality equations is performed based on the adjoint of a resolvent operator. This is used to construct a consistent algorithm based on stochastic approximation that requires only causal filtering of the time-series data.
  • Several examples are presented to illustrate the application of these techniques, including application to distributed control of multi-agent systems.

This talk is part of the CUED Control Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2017, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity