BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Signal Processing and Communications Lab Seminars
SUMMARY:Fastest Convergence for Reinforcement Learning  - 
 Prof. Sean Meyn\, University of Florida
DTSTART;TZID=Europe/London:20181008T150000
DTEND;TZID=Europe/London:20181008T160000
UID:TALK112225AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/112225
DESCRIPTION: \nThere are two well known Stochastic Approximati
 on techniques that are known to have optimal rate 
 of convergence (measured in terms of asymptotic va
 riance): the Stochastic Newton-Raphson (SNR) algor
 ithm [a matrix gain algorithm that resembles the d
 eterministic Newton-Raphson method]\,  and the Rup
 pert-Polyak averaging technique.   This talk will 
 present new applications of these concepts for rei
 nforcement learning.\n \n1.  Introducing Zap Q-Lea
 rning.   In recent work\, first presented at NIPS 
 201\,  it is shown that a new formulation of SNR p
 rovides a new approach to Q-learning that has prov
 ably optimal rate of convergence under general ass
 umptions\,   and astonishingly quick convergence i
 n numerical examples.  In particular\,  the standa
 rd Q-learning algorithm of Watkins typically has i
 nfinite asymptotic covariance\, and in simulations
  the Zap Q-Learning algorithm exhibits much faster
  convergence than the Ruppert-Polyak averaging met
 hod.   The only difficulty is the matrix inversion
  required in the SNR recursion. \n \n2. A remedy i
 s proposed based on a variant of Polyak’s heavy-ba
 ll method.  For a special choice of the “momentum"
  gain sequence\, it is shown that the parameter es
 timates obtained from the algorithm are essentiall
 y identical to those obtained using SNR.   This ne
 w algorithm does not require matrix inversion.   I
 n simulations it is found that the sample paths of
  the two algorithms couple.   A theoretical explan
 ation for coupling is established for linear recur
 sions.\n\n*Biosketch*\n\nSean Meyn received the BA
  degree in mathematics from the University of Cali
 fornia\, Los Angeles\, in 1982 and the PhD degree 
 in electrical engineering from McGill University\,
  Canada\, in 1987 (with Prof. P. Caines). He is no
 w Professor and Robert C. Pittman Eminent Scholar 
 Chair in the Department of Electrical and Computer
  Engineering at the University of Florida\, the di
 rector of the Laboratory for Cognition and Control
 \, and director of the Florida Institute for Susta
 inable Energy. His academic research interests inc
 lude theory and applications of decision and contr
 ol\, stochastic processes\, and optimization. He h
 as received many awards for his research on these 
 topics\, and is a fellow of the IEEE. He has held 
 visiting positions at universities all over the wo
 rld\, including the Indian Institute of Science\, 
 Bangalore during 1997-1998 where he was a Fulbrigh
 t Research Scholar. During his latest sabbatical d
 uring the 2006-2007 academic year he was a visitin
 g professor at MIT and United Technologies Researc
 h Center (UTRC). His award-winning 1993 monograph 
 with Richard Tweedie\, Markov Chains and Stochasti
 c Stability\, has been cited thousands of times in
  journals from a range of fields. The latest versi
 on is published in the Cambridge Mathematical Libr
 ary. For the past ten years his applied research h
 as focused on engineering\, markets\, and policy i
 n energy systems. He regularly engages in industry
 \, government\, and academic panels on these topic
 s\, and hosts an annual workshop at the University
  of Florida.\n
LOCATION:LT6\, Baker Building\, CUED
CONTACT:Prof. Ramji Venkataramanan
END:VEVENT
END:VCALENDAR