Incremental Without Replacement Sampling in Nonconvex Optimization

  title={Incremental Without Replacement Sampling in Nonconvex Optimization},
  author={Edouard Pauwels},
  journal={J. Optim. Theory Appl.},
  • Edouard Pauwels
  • Published 2021
  • Computer Science, Mathematics
  • J. Optim. Theory Appl.
Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement. On the other hands modern implementations of such techniques are incremental, they rely on sampling without replacement. We reduce this gap between theory and common usage by analysing a versatile incremental gradient scheme. We consider constant, decreasing or adaptive step sizes. In the smooth setting we obtain explicit rates and… Expand


Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand
Stochastic Subgradient Method Converges on Tame Functions
It is proved that the stochastic subgradient method, on any semialgebraic locally Lipschitz function, produces limit points that are all first-order stationary. Expand
Incremental Subgradient Methods for Nondifferentiable Optimization
A number of variants of incremental subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions are established, including some that are stochastic. Expand
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey
A unified algorithmic framework is introduced for incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi, including the advantages offered by randomization in the selection of components. Expand
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
The convergence of AdaGrad-Norm is robust to the choice of all hyper-parameters of the algorithm, in contrast to stochastic gradient descent whose convergence depends crucially on tuning the step-size to the Lipschitz smoothness constant and level of Stochastic noise on the gradient. Expand
An Inertial Newton Algorithm for Deep Learning
We introduce a new second-order inertial optimization method for machine learning called INDIAN. It exploits the geometry of the loss function while only requiring stochastic approximations of theExpand
Proximal alternating linearized minimization for nonconvex and nonsmooth problems
A self-contained convergence analysis framework is derived and it is established that each bounded sequence generated by PALM globally converges to a critical point. Expand
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
This paper theoretically analyzes in the convex and non-convex settings a generalized version of the AdaGrad stepsizes, and shows sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad Stepsizes in the non- Convex setting. Expand
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
This work provides a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent as well as a simple modification where iterates are averaged, suggesting that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate, is not robust to the lack of strong convexity or the setting of the proportionality constant. Expand
Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning
This work introduces generalized derivatives called conservative fields for which a calculus is developed and provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Expand