Corpus ID: 212414659

On the Convergence of Adam and Adagrad

@article{Dfossez2020OnTC,
  title={On the Convergence of Adam and Adagrad},
  author={Alexandre D{\'e}fossez and L. Bottou and Francis R. Bach and Nicolas Usunier},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.02395}
}
  • Alexandre Défossez, L. Bottou, +1 author Nicolas Usunier
  • Published 2020
  • Mathematics, Computer Science
  • ArXiv
  • We provide a simple proof of the convergence of the optimization algorithms Adam and Adagrad with the assumptions of smooth gradients and almost sure uniform bound on the $\ell_\infty$ norm of the gradients. This work builds on the techniques introduced by Ward et al. (2019) and extends them to the Adam optimizer. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters… CONTINUE READING

    Topics from this paper.

    Adaptive Gradient Methods for Constrained Convex Optimization
    • 1
    • PDF
    Incremental Without Replacement Sampling in Nonconvex Optimization
    Dual Averaging is Surprisingly Effective for Deep Learning Optimization

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 19 REFERENCES
    Adam: A Method for Stochastic Optimization
    • 49,675
    • Highly Influential
    • PDF
    On the Convergence of Adam and Beyond
    • 838
    • Highly Influential
    • PDF
    On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
    • 89
    • PDF
    Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
    • 6,012
    • Highly Influential
    • PDF
    The Marginal Value of Adaptive Gradient Methods in Machine Learning
    • 462
    • PDF
    AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
    • 56
    • Highly Influential
    • PDF
    On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
    • 63
    • PDF
    A Sufficient Condition for Convergences of Adam and RMSProp
    • 50
    • Highly Influential
    • PDF
    Weighted AdaGrad with Unified Momentum
    • 7
    • Highly Influential