COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |

University of Cambridge > Talks.cam > Statistics > On momentum methods and acceleration in stochastic optimization

## On momentum methods and acceleration in stochastic optimizationAdd to your list(s) Download to your calendar using vCal - Praneeth Netrapalli (Microsoft Research India)
- Monday 21 May 2018, 16:00-17:00
- MR14.
If you have a question about this talk, please contact Quentin Berthet. It is well known that momentum gradient methods (e.g., Polyak’s heavy ball, Nesterov’s acceleration) yield significant improvements over vanilla gradient descent in deterministic optimization (i.e., where we have access to exact gradient of the function to be minimized). However, there is widespread sentiment that these momentum methods are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical errors (Paige, 1971; Proakis, 1974; Polyak, 1987; Greenbaum, 1989; Roy and Shynk, 1990; Sharma et al., 1998; dâ€™Aspremont, 2008; Devolder et al., 2013, 2014; Yuan et al., 2016) but a precise understanding is lacking. This work considers these issues for the special case of stochastic approximation for the linear least squares regression problem, and shows that: 1. classical momentum methods (heavy ball and Nesterov’s acceleration) indeed do not offer any improvement over stochastic gradient descent, and 2. introduces an accelerated stochatic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent (and classical momentum methods). Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. While the results are rigorously established for the special case of linear least squares regression, experiments suggest that the conclusions hold for the training of deep neural networks. Joint work with Prateek Jain, Sham M. Kakade, Rahul Kidambi and Aaron Sidford This talk is part of the Statistics series. ## This talk is included in these lists:- All CMS events
- All Talks (aka the CURE list)
- CMS Events
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- DPMMS Lists
- DPMMS info aggregator
- DPMMS lists
- Guy Emerson's list
- Hanchen DaDaDash
- Interested Talks
- MR14
- Machine Learning
- School of Physical Sciences
- Statistical Laboratory info aggregator
- Statistics
- Statistics Group
- bld31
- custom
- rp587
Note that ex-directory lists are not shown. |
## Other listsCambridge Language Sciences Homerton Seminars Cell Biology in Medicine Seminar Series## Other talksQuantum Computing and the Met Office Moore's Law and Beyond: Routes to the Next Million-Fold Improvement in Performance Innovative Energy Models for a Sustainable Future Acceleration of tropical cyclogenesis by self-aggregation feedbacks Review of basic probability theory Complete Graphs with No Rainbow Tree |