Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization
@inproceedings{Zhuang2019SurrogateLF, title={Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization}, author={Zhenxun Zhuang and Ashok Cutkosky and F. Orabona}, booktitle={ICML}, year={2019} }
Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several years, a plethora of adaptive gradient-based algorithms have emerged to ameliorate this problem. They have proved efficient in reducing the labor of tuning in practice, but many of them lack theoretic guarantees even in the convex setting. In this paper, we… CONTINUE READING
Supplemental Presentations
One Citation
Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
- Computer Science, Mathematics
- ICML
- 2019
- 34
- PDF
References
SHOWING 1-10 OF 28 REFERENCES
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2011
- 6,418
- PDF
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
- Mathematics, Computer Science
- AISTATS
- 2019
- 74
- PDF
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
- Computer Science, Mathematics
- COLT
- 2019
- 3
- PDF
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
- Mathematics, Computer Science
- ArXiv
- 2018
- 66
- PDF
A survey of Algorithms and Analysis for Adaptive Online Learning
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2017
- 73
- PDF
Online Learning Rate Adaptation with Hypergradient Descent
- Computer Science, Mathematics
- ICLR
- 2018
- 83
- PDF