BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Microsoft Research Cambridge\, public talks
SUMMARY:DiCE: The Infinitely Differentiable Monte-Carlo Es
timator - Jakob Foerster\, University of Oxford
DTSTART;TZID=Europe/London:20180619T130000
DTEND;TZID=Europe/London:20180619T140000
UID:TALK107470AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/107470
DESCRIPTION:The score function estimator is widely used for es
timating gradients of stochastic objectives in Sto
chastic Computation Graphs (SCG)\, eg. in reinforc
ement learning and meta-learning. While deriving t
he first-order gradient estimators by differentiat
ing a surrogate loss (SL) objective is computation
ally and conceptually simple\, using the same appr
oach for higher-order gradients is more challengin
g. Firstly\, analytically deriving and implementin
g such estimators is laborious and not compliant w
ith automatic differentiation. Secondly\, repeated
ly applying SL to construct new objectives for eac
h order gradient involves increasingly cumbersome
graph manipulations. Lastly\, to match the first-o
rder gradient under differentiation\, SL treats pa
rt of the cost as a fixed sample\, which we show l
eads to missing and wrong terms for higher-order g
radient estimators. To address all these shortcomi
ngs in a unified way\, we introduce DiCE\, which p
rovides a single objective that can be differentia
ted repeatedly\, generating correct gradient estim
ators of any order in SCGs. Unlike SL\, DiCE relie
s on automatic differentiation for performing the
requisite graph manipulations. We verify the corre
ctness of DiCE both through a proof and through nu
merical evaluation of the DiCE gradient estimates.
We also use DiCE to propose and evaluate a novel
approach for multi-agent learning. Our code is ava
ilable at this URL
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station R
oad\, Cambridge\, CB1 2FB
CONTACT:Microsoft Research Cambridge Talks Admins
END:VEVENT
END:VCALENDAR