# Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate

@article{Mokhtari2018SurpassingGD, title={Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate}, author={Aryan Mokhtari and Mert G{\"u}rb{\"u}zbalaban and Alejandro Ribeiro}, journal={ArXiv}, year={2018}, volume={abs/1611.00347} }

Recently, there has been growing interest in developing optimization methods for solving large-scale machine learning problems. Most of these problems boil down to the problem of minimizing an average of a finite set of smooth and strongly convex functions where the number of functions $n$ is large. Gradient descent method (GD) is successful in minimizing convex problems at a fast linear rate; however, it is not applicable to the considered large-scale optimization setting because of the high… Expand

#### 28 Citations

IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate

- Mathematics, Computer Science
- SIAM J. Optim.
- 2018

IQN is the first stochastic quasi-Newton method proven to converge superlinearly in a local neighborhood of the optimal solution and establishes its local superlinear convergence rate. Expand

Curvature-aided incremental aggregated gradient method

- Computer Science, Mathematics
- 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2017

The curvature-aided incremental aggregated gradient method aims to exploit the incrementally aggregated Hessian matrix to trace the full gradient vector at every incremental step, therefore achieving an improved linear convergence rate over the state-of-the-art IAG methods. Expand

Incremental Greedy BFGS: An Incremental Quasi-Newton Method with Explicit Superlinear Rate

- 2020

Finite-sum minimization, i.e., problems where the objective may be written as the sum over a collection of instantaneous costs, are ubiquitous in modern machine learning and data science. Efficient… Expand

Efficient Methods For Large-Scale Empirical Risk Minimization

- Computer Science
- 2017

This thesis introduces a rethinking of ERM in which not a partition of the training set as in the case of stochastic and distributed optimization, but a nested collection of subsets that grow geometrically and introduces an incremental method that exploits memory to achieve a superlinear convergence rate. Expand

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

- Computer Science, Mathematics
- ArXiv
- 2019

This work presents two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions and establishes local linear-quadratic convergence results. Expand

Convergence rates of subgradient methods for quasi-convex optimization problems

- Mathematics, Computer Science
- Comput. Optim. Appl.
- 2020

This paper investigates the iteration complexity and convergence rates of various subgradient methods for solving quasi-convex optimization in a unified framework, and considers a sequence satisfying a general (inexact) basic inequality, and investigates the global convergence theorem and the iterations complexity when using the constant, diminishing or dynamic stepsize rules. Expand

Incremental Methods for Weakly Convex Optimization

- Mathematics, Computer Science
- ArXiv
- 2019

It is shown that all the three incremental algorithms with a geometrical diminishing stepsize and an appropriate initialization converge to the optimal solution set when the weakly convex function satisfies an additional regularity condition called sharpness. Expand

M L ] 2 4 O ct 2 01 7 Curvature-aided Incremental Aggregated Gradient Method

- 2017

We propose a new algorithm for finite sum optimization which we call the curvature-aided incremental aggregated gradient (CIAG) method. Motivated by the problem of training a classifier for a… Expand

Stochastic Quasi-Newton Methods

- Computer Science
- Proceedings of the IEEE
- 2020

Recent developments to accelerate the convergence of stochastic optimization through the exploitation of second-order information are discussed along with the introduction of an incremental method that exploits memory to achieve a superlinear convergence rate. Expand

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate

- Mathematics, Computer Science
- AISTATS
- 2020

A distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence guarantees is developed, believed to be the first distributed asynchronous algorithm with super linear convergence guarantees to be developed. Expand

#### References

SHOWING 1-10 OF 43 REFERENCES

Incremental Subgradient Methods for Nondifferentiable Optimization

- Computer Science, Mathematics
- SIAM J. Optim.
- 2001

A number of variants of incremental subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions are established, including some that are stochastic. Expand

A double incremental aggregated gradient method with linear convergence rate for large-scale optimization

- Mathematics, Computer Science
- 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017

It is proved that not only the proposed DIAG method converges linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods on full batch gradient descent. Expand

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

- Computer Science, Mathematics
- SIAM J. Optim.
- 2015

This work proposes an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning, and presents convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error. Expand

Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods

- Mathematics, Computer Science
- SIAM J. Optim.
- 2018

This paper is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order, and shows that the PiaG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small. Expand

Linear Convergence with Condition Number Independent Access of Full Gradients

- Mathematics, Computer Science
- NIPS
- 2013

This paper proposes to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function, and presents a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. Expand

A New Class of Incremental Gradient Methods for Least Squares Problems

- Mathematics, Computer Science
- SIAM J. Optim.
- 1997

This work embeds both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and proposes a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rates of steepmost descent. Expand

Semi-Stochastic Gradient Descent Methods

- Mathematics, Computer Science
- Front. Appl. Math. Stat.
- 2017

A new method is proposed, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. Expand

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

- Mathematics, Computer Science
- SIAM J. Optim.
- 2014

This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient. Expand

An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule

- Mathematics, Computer Science
- SIAM J. Optim.
- 1998

We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the… Expand

DSA: Decentralized Double Stochastic Averaging Gradient Algorithm

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2016

The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on strong convexity of local functions and Lipschitz continuity of local gradients to guarantee linear convergence of the sequence generated by DSA in expectation. Expand