Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate

@article{Mokhtari2018SurpassingGD,
  title={Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate},
  author={Aryan Mokhtari and Mert G{\"u}rb{\"u}zbalaban and Alejandro Ribeiro},
  journal={ArXiv},
  year={2018},
  volume={abs/1611.00347}
}
Recently, there has been growing interest in developing optimization methods for solving large-scale machine learning problems. Most of these problems boil down to the problem of minimizing an average of a finite set of smooth and strongly convex functions where the number of functions $n$ is large. Gradient descent method (GD) is successful in minimizing convex problems at a fast linear rate; however, it is not applicable to the considered large-scale optimization setting because of the high… Expand
IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate
TLDR
IQN is the first stochastic quasi-Newton method proven to converge superlinearly in a local neighborhood of the optimal solution and establishes its local superlinear convergence rate. Expand
Curvature-aided incremental aggregated gradient method
TLDR
The curvature-aided incremental aggregated gradient method aims to exploit the incrementally aggregated Hessian matrix to trace the full gradient vector at every incremental step, therefore achieving an improved linear convergence rate over the state-of-the-art IAG methods. Expand
Incremental Greedy BFGS: An Incremental Quasi-Newton Method with Explicit Superlinear Rate
Finite-sum minimization, i.e., problems where the objective may be written as the sum over a collection of instantaneous costs, are ubiquitous in modern machine learning and data science. EfficientExpand
Efficient Methods For Large-Scale Empirical Risk Minimization
TLDR
This thesis introduces a rethinking of ERM in which not a partition of the training set as in the case of stochastic and distributed optimization, but a nested collection of subsets that grow geometrically and introduces an incremental method that exploits memory to achieve a superlinear convergence rate. Expand
Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
TLDR
This work presents two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions and establishes local linear-quadratic convergence results. Expand
Convergence rates of subgradient methods for quasi-convex optimization problems
TLDR
This paper investigates the iteration complexity and convergence rates of various subgradient methods for solving quasi-convex optimization in a unified framework, and considers a sequence satisfying a general (inexact) basic inequality, and investigates the global convergence theorem and the iterations complexity when using the constant, diminishing or dynamic stepsize rules. Expand
Incremental Methods for Weakly Convex Optimization
TLDR
It is shown that all the three incremental algorithms with a geometrical diminishing stepsize and an appropriate initialization converge to the optimal solution set when the weakly convex function satisfies an additional regularity condition called sharpness. Expand
M L ] 2 4 O ct 2 01 7 Curvature-aided Incremental Aggregated Gradient Method
We propose a new algorithm for finite sum optimization which we call the curvature-aided incremental aggregated gradient (CIAG) method. Motivated by the problem of training a classifier for aExpand
Stochastic Quasi-Newton Methods
TLDR
Recent developments to accelerate the convergence of stochastic optimization through the exploitation of second-order information are discussed along with the introduction of an incremental method that exploits memory to achieve a superlinear convergence rate. Expand
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
TLDR
A distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence guarantees is developed, believed to be the first distributed asynchronous algorithm with super linear convergence guarantees to be developed. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 43 REFERENCES
Incremental Subgradient Methods for Nondifferentiable Optimization
TLDR
A number of variants of incremental subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions are established, including some that are stochastic. Expand
A double incremental aggregated gradient method with linear convergence rate for large-scale optimization
TLDR
It is proved that not only the proposed DIAG method converges linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods on full batch gradient descent. Expand
Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning
  • J. Mairal
  • Computer Science, Mathematics
  • SIAM J. Optim.
  • 2015
TLDR
This work proposes an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning, and presents convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error. Expand
Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods
TLDR
This paper is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order, and shows that the PiaG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small. Expand
Linear Convergence with Condition Number Independent Access of Full Gradients
TLDR
This paper proposes to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function, and presents a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. Expand
A New Class of Incremental Gradient Methods for Least Squares Problems
TLDR
This work embeds both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and proposes a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rates of steepmost descent. Expand
Semi-Stochastic Gradient Descent Methods
TLDR
A new method is proposed, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. Expand
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
TLDR
This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient. Expand
An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule
  • P. Tseng
  • Mathematics, Computer Science
  • SIAM J. Optim.
  • 1998
We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases theExpand
DSA: Decentralized Double Stochastic Averaging Gradient Algorithm
TLDR
The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on strong convexity of local functions and Lipschitz continuity of local gradients to guarantee linear convergence of the sequence generated by DSA in expectation. Expand
...
1
2
3
4
5
...