BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Isaac Newton Institute Seminar Series
SUMMARY:Stochastic Quasi-Gradient Methods: Variance Reduct
ion via Jacobian Sketching - Peter Richtarik (Univ
ersity of Edinburgh\; King Abdullah University of
Science and Technology (KAUST)\; Moscow Institute
of Physics and Technology)
DTSTART;TZID=Europe/London:20190620T095000
DTEND;TZID=Europe/London:20190620T104000
UID:TALK126271AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/126271
DESCRIPTION:We develop a new family of variance reduced stocha
stic gradient descent methods for minimizing the a
verage of a very large number of smooth functions.
Our method&mdash\;JacSketch&mdash\;is motivated b
y novel de- velopments in randomized numerical lin
ear algebra\, and operates by maintaining a stocha
stic estimate of a Jacobian matrix composed of the
gradients of individual functions. In each iterat
ion\, JacSketch efficiently updates the Jacobian m
atrix by first obtaining a random linear measureme
nt of the true Jacobian through (cheap) sketching\
, and then projecting the previous estimate onto t
he solution space of a linear matrix equa- tion wh
ose solutions are consistent with the measurement.
The Jacobian estimate is then used to compute a v
ariance-reduced unbiased estimator of the gradient
\, followed by a stochastic gradient descent step.
Our strategy is analogous to the way quasi-Newton
methods maintain an estimate of the Hessian\, and
hence our method can be seen as a stochastic q ua
si-gradient method. Indeed\, quasi-Newton methods
project the current Hessian estimate onto a soluti
on space of a linear equation consistent with a ce
rtain linear (but non-random) measurement of the t
rue Hessian. Our method can also be seen as stocha
stic gradient descent applied to a controlled stoc
hastic optimization reformulation of the original
problem\, where the control comes from the Jacobia
n estimates. We prove that for smooth and strongly
convex functions\, JacSketch converges linearly w
ith a meaningful rate dictated by a single converg
ence theorem which applies to general sketches. We
also provide a refined convergence theorem which
applies to a smaller class of sketches\, featuring
a novel proof technique based on a stochastic Lya
punov function. This enables us to obtain sharper
complexity results for variants of JacSketch with
importance sampling. By specializing our general a
pproach to specific sketching strategies\, JacSket
ch reduces to the celebrated stochastic average gr
adient (SAGA) method\, and

Co-authors: Robe
rt Mansel Gower (Telecom ParisTech)\, Francis Bach
(INRIA - ENS - PSL Research University)

LOCATION:Seminar Room 1\, Newton Institute
CONTACT:INI IT
END:VEVENT
END:VCALENDAR