# On the Convergence Rate of Incremental Aggregated Gradient Algorithms

@article{Grbzbalaban2017OnTC, title={On the Convergence Rate of Incremental Aggregated Gradient Algorithms}, author={Mert G{\"u}rb{\"u}zbalaban and Asuman E. Ozdaglar and Pablo A. Parrilo}, journal={SIAM J. Optim.}, year={2017}, volume={27}, pages={1035-1048} }

Motivated by applications to distributed optimization over networks and large-scale data processing in machine learning, we analyze the deterministic incremental aggregated gradient method for minimizing a finite sum of smooth functions where the sum is strongly convex. This method processes the functions one at a time in a deterministic order and incorporates a memory of previous gradient values to accelerate convergence. Empirically it performs well in practice; however, no theoretical…

## Figures from this paper

## 122 Citations

Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods

- Computer Science, MathematicsSIAM J. Optim.
- 2018

This paper is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order, and shows that the PiaG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small.

Global convergence rate of incremental aggregated gradient methods for nonsmooth problems

- Computer Science, Mathematics2016 IEEE 55th Conference on Decision and Control (CDC)
- 2016

The first linear convergence rate result for the PIAG method is shown and explicit convergence rate estimates are provided that highlight the dependence on the condition number of the problem and the size of the window K over which outdated component gradients are evaluated.

A double incremental aggregated gradient method with linear convergence rate for large-scale optimization

- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017

It is proved that not only the proposed DIAG method converges linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods on full batch gradient descent.

Can speed up the convergence rate of stochastic gradient methods to O(1/k2) by a gradient averaging strategy?

- MathematicsArXiv
- 2020

In this paper we consider the question of whether it is possible to apply a gradient averaging strategy to improve on the sublinear convergence rates without any increase in storage. Our analysis…

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate

- Computer Science, MathematicsAISTATS
- 2020

A distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence guarantees is developed, believed to be the first distributed asynchronous algorithm with super linear convergence guarantees to be developed.

Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity

- Computer ScienceOptim. Methods Softw.
- 2017

This paper develops a framework for analysing the non-asymptotic convergence rates of IGMs when they are applied to a class of structured convex optimization problems that includes least squares regression and logistic regression and demonstrates the power of the framework by proving new linear convergence results for three recently proposed algorithms.

General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme

- Computer ScienceNeurIPS
- 2019

The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists.

General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme

- Computer Science
- 2019

The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists.

General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme

- Computer Science
- 2019

The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists.

Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate

- Computer ScienceSIAM J. Optim.
- 2018

A Double Incremental Aggregated Gradient method that computes the gradient of only one function at each iteration, which is chosen based on a cyclic scheme, and uses the aggregated average gradient of all the functions to approximate the full gradient.

## References

SHOWING 1-10 OF 34 REFERENCES

Convergence Rate of Incremental Gradient and Newton Methods

- Computer Science
- 2015

This paper presents fast convergence results for the incremental gradient and incremental Newton methods under the constant and diminishing stepsizes and shows that to achieve the fastest 1/k rate, incremental gradient needs a stepsize that requires tuning to the strong convexity parameter whereas the incremental Newton method does not.

A globally convergent incremental Newton method

- Computer Science, MathematicsMath. Program.
- 2015

It is shown that the incremental Newton method for minimizing the sum of a large number of strongly convex functions is globally convergent for a variable stepsize rule and under a gradient growth condition, convergence rate is linear for both variable and constant stepsize rules.

A delayed proximal gradient method with linear convergence rate

- Computer Science2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
- 2014

This paper derives an explicit expression that quantifies how the convergence rate depends on objective function properties and algorithm parameters such as step-size and the maximum delay, and reveals the trade-off between convergence speed and residual error.

Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

- Computer ScienceComput. Optim. Appl.
- 1998

The first convergence results of any kind for this computationally important case are derived and it is shown that a certain ε-approximate solution can be obtained and the linear dependence of ε on the stepsize limit is established.

A Convergent Incremental Gradient Method with a Constant Step Size

- Mathematics, Computer ScienceSIAM J. Optim.
- 2007

An incremental aggregated gradient method for minimizing a sum of continuously differentiable functions and it is shown that the method visits infinitely often regions in which the gradient is small.

Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

- Mathematics, Computer Science
- 2013

It is shown that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an O(1/k) convergence rate, and has a linear convergence rate if $g$ is strongly-convex.

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

- Computer ScienceNIPS
- 2012

A new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex, which incorporates a memory of previous gradient values in order to achieve a linear convergence rate.

Incrementally Updated Gradient Methods for Constrained and Regularized Optimization

- Mathematics, Computer ScienceJ. Optim. Theory Appl.
- 2014

Every cluster point of the iterates generated by the method is a stationary point, and if in addition a local Lipschitz error bound assumption holds, then themethod is linearly convergent.

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

- Computer Science, MathematicsArXiv
- 2015

A unified algorithmic framework is introduced for incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi, including the advantages offered by randomization in the selection of components.

Gradient methods for minimizing composite objective function

- Computer Science, Mathematics
- 2007

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and…