# A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization

@article{Alecsa2021ATA, title={A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization}, author={Cristian Daniel Alecsa}, journal={ArXiv}, year={2021}, volume={abs/2110.08531} }

In the following paper we introduce new adaptive algorithms endowed with momentum terms for stochastic non-convex optimization problems. We investigate the almost sure convergence to stationary points, along with a finite-time horizon analysis with respect to a chosen final iteration, and we also inspect the worst-case iteration complexity. An estimate for the expectation of the squared Euclidean norm of the gradient is given and the theoretical analysis that we perform is assisted by various…

## Figures and Tables from this paper

## References

SHOWING 1-10 OF 39 REFERENCES

Convergence Rates of a Momentum Algorithm with Bounded Adaptive Step Size for Nonconvex Optimization

- Computer Science, MathematicsACML
- 2020

This work studies the Adam algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate and shows a novel first order convergence rate result in both deterministic and stochastic contexts.

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning

- Computer Science, MathematicsNIPS
- 2011

This work provides a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent as well as a simple modification where iterates are averaged, suggesting that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate, is not robust to the lack of strong convexity or the setting of the proportionality constant.

Asymptotic study of stochastic adaptive algorithm in non-convex landscape

- Computer Science, MathematicsArXiv
- 2020

This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox…

A gradient-type algorithm with backward inertial steps associated to a nonconvex minimization problem

- Mathematics, Computer ScienceNumerical Algorithms
- 2019

We investigate an algorithm of gradient type with a backward inertial step in connection with the minimization of a nonconvex differentiable function. We show that the generated sequences converge to…

A High Probability Analysis of Adaptive SGD with Momentum

- Computer Science, MathematicsArXiv
- 2020

A high probability analysis for adaptive and momentum algorithms, under weak assumptions on the function, stochastic gradients, and learning rates is presented and it is used to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2010

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming

- Mathematics, Computer ScienceSIAM J. Optim.
- 2013

This paper discusses a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and shows that such modification allows to improve significantly the large-deviation properties of the algorithms.

Global convergence of the Heavy-ball method for convex optimization

- Mathematics, Computer Science2015 European Control Conference (ECC)
- 2015

This paper establishes global convergence and provides global bounds of the rate of convergence for the Heavy-ball method for convex optimization. When the objective function has Lipschitz-continuous…

Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization

- Computer Science, MathematicsMath. Program.
- 2021

An inertial algorithm of gradient type in connection with the minimization of a nonconvex differentiable function is investigated and it is shown that the generated sequences converge to a critical point of the objective function, if a regularization of the Objective function satisfies the Kurdyka-Lojasiewicz property.

Adaptive Methods for Nonconvex Optimization

- Computer ScienceNeurIPS
- 2018

The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.