• Corpus ID: 53705343

Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity

  title={Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity},
  author={S{\'e}bastien Gadat and Fabien Panloup},
  journal={arXiv: Statistics Theory},
This paper is devoted to the non-asymptotic control of the mean-squared error for the Ruppert-Polyak stochastic averaged gradient descent introduced in the seminal contributions of [Rup88] and [PJ92]. In our main results, we establish non-asymptotic tight bounds (optimal with respect to the Cramer-Rao lower bound) in a very general framework that includes the uniformly strongly convex case as well as the one where the function f to be minimized satisfies a weaker Kurdyka-Lojiasewicz-type… 
Stochastic Heavy ball
This paper deals with a natural stochastic optimization procedure derived from the so-called Heavy-ball method differential equation, which was introduced by Polyak in the 1960s with his seminal
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
The results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function nor to the algorithm itself, as typically required in robust statistics.
Optimal variance-reduced stochastic approximation in Banach spaces
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator,
On the asymptotic rate of convergence of Stochastic Newton algorithms and their Weighted Averaged versions.
The unified framework considered in this paper covers the case of linear, logistic or softmax regressions to name a few, and establishes almost sure convergences and rates of convergence of the algorithms, as well as central limit theorems for the constructed parameter estimates.
Convergence in quadratic mean of averaged stochastic gradient algorithms without strong convexity nor bounded gradient
This paper focuses on giving explicit bounds of the quadratic mean error of the estimates of the averaged stochastic gradient algorithms, with very weak assumptions, i.e without supposing that the function the authors would like to minimize is strongly convex or admits a bounded gradient.
A stochastic Gauss-Newton algorithm for regularized semi-discrete optimal transport
A new second order stochastic algorithm to estimate the entropically regularized optimal transport cost between two probability measures that is adaptive to the geometry of the underlying convex optimization problem with no important hyperparameter to be accurately tuned is introduced.
Non asymptotic controls on a recursive superquantile approximation
A new recursive stochastic algorithm for the joint estimation of quantile and superquantile of an unknown distribution is studied, to use the Cesaro averaging of the quantile estimation inside the recursive approximation of thesuperquantile.
Non asymptotic controls on a stochastic algorithm for superquantile approximation
A new recursive stochastic algorithm for the joint estimation of quantile and superquantile of an unknown distribution is studied, to use the Cesaro averaging of the quantile estimation inside the recursive approximation of thesuperquantile.
Asymptotic study of stochastic adaptive algorithm in non-convex landscape
This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox
GANs Training: A Game and Stochastic Control Approach
  • 2021


Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
This work provides a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent as well as a simple modification where iterates are averaged, suggesting that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate, is not robust to the lack of strong convexity or the setting of the proportionality constant.
From error bounds to the complexity of first-order descent methods for convex functions
It is shown that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization and how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems.
Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity
The classical Lojasiewicz inequality and its extensions for partial differential equation problems (Simon) and to o-minimal structures (Kurdyka) have a considerable impact on the analysis of
Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
  • F. Bach
  • Mathematics, Computer Science
    J. Mach. Learn. Res.
  • 2014
After N iterations, with a constant step-size proportional to 1/R2√N where N is the number of observations and R is the maximum norm of the observations, the convergence rate is always of order O(1/ √N), and improves to O(R2/µN), which shows that averaged stochastic gradient is adaptive to unknown local strong convexity of the objective function.
Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls
This work aims at studying more precisely the non asymptotic behavior of the recursive non linear Robbins-Monro algorithm by giving non asykptotic confidence balls by the derivation of improved $L^2$ rates of convergence as well as an exponential inequality for the martingale terms of the recursion.
Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization
A new notion of discrepancy between functions is introduced, and used to reduce problems of stochastic convex optimization to statistical parameter estimation, which can be lower bounded using information-theoretic methods.
Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm
This work focuses here on the estimation of the geometric median which is a direct generalization of the real median and has nice robustness properties and the asymptotic normality of its averaged version of the algorithm is the same as the classic estimators.
Central Limit Theorems for Stochastic Approximation with controlled Markov chain dynamics
This paper provides a Central Limit Theorem (CLT) for a process $\{\theta_n, n\geq 0\}$ satisfying a stochastic approximation (SA) equation of the form $\theta_{n+1} = \theta_n + \gamma_{n+1}
Introductory Lectures on Convex Optimization - A Basic Course
It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments.
Random iterative models
This book provides a wide-angle view of stochastic approximation, linear and non-linear models, controlled Markov chains, estimation and adaptive control, learning, and algorithms with good performances and reasonably easy computation.