• Corpus ID: 239016818

A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization

@article{Alecsa2021ATA,
  title={A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization},
  author={Cristian Daniel Alecsa},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.08531}
}
  • C. Alecsa
  • Published 16 October 2021
  • Computer Science, Mathematics
  • ArXiv
In the following paper we introduce new adaptive algorithms endowed with momentum terms for stochastic non-convex optimization problems. We investigate the almost sure convergence to stationary points, along with a finite-time horizon analysis with respect to a chosen final iteration, and we also inspect the worst-case iteration complexity. An estimate for the expectation of the squared Euclidean norm of the gradient is given and the theoretical analysis that we perform is assisted by various… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 39 REFERENCES
Convergence Rates of a Momentum Algorithm with Bounded Adaptive Step Size for Nonconvex Optimization
TLDR
This work studies the Adam algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate and shows a novel first order convergence rate result in both deterministic and stochastic contexts.
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
TLDR
This work provides a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent as well as a simple modification where iterates are averaged, suggesting that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate, is not robust to the lack of strong convexity or the setting of the proportionality constant.
Asymptotic study of stochastic adaptive algorithm in non-convex landscape
This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox
A gradient-type algorithm with backward inertial steps associated to a nonconvex minimization problem
We investigate an algorithm of gradient type with a backward inertial step in connection with the minimization of a nonconvex differentiable function. We show that the generated sequences converge to
A High Probability Analysis of Adaptive SGD with Momentum
TLDR
A high probability analysis for adaptive and momentum algorithms, under weak assumptions on the function, stochastic gradients, and learning rates is presented and it is used to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TLDR
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
TLDR
This paper discusses a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and shows that such modification allows to improve significantly the large-deviation properties of the algorithms.
Global convergence of the Heavy-ball method for convex optimization
This paper establishes global convergence and provides global bounds of the rate of convergence for the Heavy-ball method for convex optimization. When the objective function has Lipschitz-continuous
Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization
  • S. László
  • Computer Science, Mathematics
    Math. Program.
  • 2021
TLDR
An inertial algorithm of gradient type in connection with the minimization of a nonconvex differentiable function is investigated and it is shown that the generated sequences converge to a critical point of the objective function, if a regularization of the Objective function satisfies the Kurdyka-Lojasiewicz property.
Adaptive Methods for Nonconvex Optimization
TLDR
The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.
...
1
2
3
4
...