Corpus ID: 65455367

On the Convergence of Adam and Beyond

@article{Reddi2018OnTC,
  title={On the Convergence of Adam and Beyond},
  author={Sashank J. Reddi and S. Kale and S. Kumar},
  journal={ArXiv},
  year={2018},
  volume={abs/1904.09237}
}
Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam, etc are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. [...] Key Result Our analysis suggests that the convergence issues may be fixed by endowing such algorithms with "long-term memory" of past gradients, and propose new variants of the Adam algorithm which not only fix the convergence…Expand
1,084 Citations
Adam revisited: a weighted past gradients perspective
  • 7
  • Highly Influenced
  • PDF
ADAPTIVE LEARNING RATE METHODS
  • Highly Influenced
A Sufficient Condition for Convergences of Adam and RMSProp
  • 89
  • Highly Influenced
  • PDF
Convergence Guarantees for RMSProp and ADAM in Non-Convex Optimization and an Empirical Comparison to Nesterov Acceleration
  • 33
  • Highly Influenced
  • PDF
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
  • 28
  • Highly Influenced
  • PDF
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
  • Highly Influenced
  • PDF
On the Convergence of AdaBound and its Connection to SGD
  • 7
  • Highly Influenced
  • PDF
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
  • 119
  • Highly Influenced
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 10 REFERENCES
Adam: A Method for Stochastic Optimization
  • 63,360
  • PDF
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
  • 6,772
  • Highly Influential
  • PDF
Adaptive and Self-Confident On-Line Learning Algorithms
  • 216
  • PDF
On the generalization ability of on-line learning algorithms
  • 482
  • PDF
Dropout: a simple way to prevent neural networks from overfitting
  • 22,456
  • PDF
Adaptive Bound Optimization for Online Convex Optimization
  • 210
  • PDF
ImageNet classification with deep convolutional neural networks
  • 62,892
  • PDF
ADADELTA: An Adaptive Learning Rate Method
  • 4,689
  • PDF
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
  • 1,699
  • PDF
Incorporating Nesterov Momentum into Adam
  • 822