On the Convergence Rate of Incremental Aggregated Gradient Algorithms

@article{Grbzbalaban2017OnTC,
  title={On the Convergence Rate of Incremental Aggregated Gradient Algorithms},
  author={Mert G{\"u}rb{\"u}zbalaban and Asuman E. Ozdaglar and Pablo A. Parrilo},
  journal={SIAM J. Optim.},
  year={2017},
  volume={27},
  pages={1035-1048}
}
Motivated by applications to distributed optimization over networks and large-scale data processing in machine learning, we analyze the deterministic incremental aggregated gradient method for minimizing a finite sum of smooth functions where the sum is strongly convex. This method processes the functions one at a time in a deterministic order and incorporates a memory of previous gradient values to accelerate convergence. Empirically it performs well in practice; however, no theoretical… 

Figures from this paper

Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods
TLDR
This paper is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order, and shows that the PiaG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small.
Global convergence rate of incremental aggregated gradient methods for nonsmooth problems
TLDR
The first linear convergence rate result for the PIAG method is shown and explicit convergence rate estimates are provided that highlight the dependence on the condition number of the problem and the size of the window K over which outdated component gradients are evaluated.
A double incremental aggregated gradient method with linear convergence rate for large-scale optimization
TLDR
It is proved that not only the proposed DIAG method converges linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods on full batch gradient descent.
Can speed up the convergence rate of stochastic gradient methods to O(1/k2) by a gradient averaging strategy?
In this paper we consider the question of whether it is possible to apply a gradient averaging strategy to improve on the sublinear convergence rates without any increase in storage. Our analysis
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
TLDR
A distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence guarantees is developed, believed to be the first distributed asynchronous algorithm with super linear convergence guarantees to be developed.
Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity
TLDR
This paper develops a framework for analysing the non-asymptotic convergence rates of IGMs when they are applied to a class of structured convex optimization problems that includes least squares regression and logistic regression and demonstrates the power of the framework by proving new linear convergence results for three recently proposed algorithms.
General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme
TLDR
The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists.
General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme
TLDR
The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists.
General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme
TLDR
The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists.
Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate
TLDR
A Double Incremental Aggregated Gradient method that computes the gradient of only one function at each iteration, which is chosen based on a cyclic scheme, and uses the aggregated average gradient of all the functions to approximate the full gradient.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Convergence Rate of Incremental Gradient and Newton Methods
TLDR
This paper presents fast convergence results for the incremental gradient and incremental Newton methods under the constant and diminishing stepsizes and shows that to achieve the fastest 1/k rate, incremental gradient needs a stepsize that requires tuning to the strong convexity parameter whereas the incremental Newton method does not.
A globally convergent incremental Newton method
TLDR
It is shown that the incremental Newton method for minimizing the sum of a large number of strongly convex functions is globally convergent for a variable stepsize rule and under a gradient growth condition, convergence rate is linear for both variable and constant stepsize rules.
A delayed proximal gradient method with linear convergence rate
TLDR
This paper derives an explicit expression that quantifies how the convergence rate depends on objective function properties and algorithm parameters such as step-size and the maximum delay, and reveals the trade-off between convergence speed and residual error.
Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero
  • M. Solodov
  • Computer Science
    Comput. Optim. Appl.
  • 1998
TLDR
The first convergence results of any kind for this computationally important case are derived and it is shown that a certain ε-approximate solution can be obtained and the linear dependence of ε on the stepsize limit is established.
A Convergent Incremental Gradient Method with a Constant Step Size
TLDR
An incremental aggregated gradient method for minimizing a sum of continuously differentiable functions and it is shown that the method visits infinitely often regions in which the gradient is small.
Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition
TLDR
It is shown that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an O(1/k) convergence rate, and has a linear convergence rate if $g$ is strongly-convex.
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
TLDR
A new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex, which incorporates a memory of previous gradient values in order to achieve a linear convergence rate.
Incrementally Updated Gradient Methods for Constrained and Regularized Optimization
  • P. Tseng, S. Yun
  • Mathematics, Computer Science
    J. Optim. Theory Appl.
  • 2014
TLDR
Every cluster point of the iterates generated by the method is a stationary point, and if in addition a local Lipschitz error bound assumption holds, then themethod is linearly convergent.
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey
TLDR
A unified algorithmic framework is introduced for incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi, including the advantages offered by randomization in the selection of components.
Gradient methods for minimizing composite objective function
In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and
...
1
2
3
4
...