• Corpus ID: 14784580

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

@article{RoostaKhorasani2016SubSampledNM,
title={Sub-Sampled Newton Methods I: Globally Convergent Algorithms},
author={Farbod Roosta-Khorasani and Michael W. Mahoney},
journal={ArXiv},
year={2016},
volume={abs/1601.04737}
}
• Published 18 January 2016
• Computer Science, Mathematics
• ArXiv
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms and we provide bounds on the convergence of the variants of Newton's method that incorporate uniform sub-sampling…
85 Citations

Tables from this paper

• Mathematics, Computer Science
ArXiv
• 2016
The analysis here can be used to complement the results of the basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice.
• Computer Science, Mathematics
• 2018
This paper proposes to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective of an unconstrained optimization model and develops both standard and accelerated adaptive cubic regularization approaches and provides theoretical guarantees on global iteration complexity.
• Computer Science, Mathematics
ArXiv
• 2016
This work proposes two new efficient Newton-type methods, Refined Sub-sampled Newton and Refined Sketch Newton, which exhibit a great advantage over existing sub-sampling Newton methods, especially when Hessian-vector multiplication can be calculated efficiently.
• Murat A. Erdogdu
• Computer Science, Mathematics
2017 Information Theory and Applications Workshop (ITA)
• 2017
Under certain assumptions, it is shown that the constrained optimization algorithm attains a composite convergence rate that is initially quadratic and asymptotically linear and validate its performance through widely encountered optimization tasks on several real and synthetic datasets by comparing it to classical optimization algorithms.
• Computer Science
• 2016
This paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact).
This work proposes an alternative way of constructing the curvature information by formulating it as an estimation problem and applying a Stein-type lemma, which allows further improvements through sub-sampling and eigenvalue thresholding and achieves the highest performance compared to various algorithms on several datasets.
• Computer Science, Mathematics
AISTATS
• 2020
Stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models are considered and the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.
• Computer Science
INFORMS Journal on Optimization
• 2022
This paper describes a stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective and presents worst-case complexity guarantees for a notion of stationarity tailored to the subsampled context.
• Mathematics, Computer Science
INFORMS Journal on Optimization
• 2021
This work presents an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and shows that the expected Hessian sample complexity is [Formula: see text] for finding an approximate local solution with second order guarantee.
• Computer Science
• 2020
This work motivates the extension of Newton methods to the SA regime, and argues for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation.

References

SHOWING 1-10 OF 70 REFERENCES

• Mathematics, Computer Science
ArXiv
• 2016
The analysis here can be used to complement the results of the basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice.
• Computer Science
NIPS
• 2015
This paper uses sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost.
• Computer Science, Mathematics
SIAM J. Optim.
• 2017
A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.
• Computer Science
Math. Program.
• 2016
A general framework is proposed, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provides a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.
• Computer Science
AISTATS
• 2009
An optimization algorithm for minimizing a smooth function over a convex set by minimizing a diagonal plus lowrank quadratic approximation to the function, which substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random elds.
• Computer Science
J. Mach. Learn. Res.
• 2010
A new, efficient, exact line search algorithm that is comparable to or better than specialized state-of-the-art solvers on a number of publicly available data sets and proves its worst-case time complexity bounds.
An update formula which generates matrices using information from the last m iterations, where m is any number supplied by the user, and the BFGS method is considered to be the most efficient.
• Computer Science
Math. Program.
• 2012
A criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient, and establishes an O(1/\epsilon) complexity bound on the total cost of a gradient method.
• Computer Science, Mathematics
Numerische Mathematik
• 2011
This work presents two randomized algorithms that provide accurate relative-error approximations to the optimal value and the solution vector of a least squares approximation problem more rapidly than existing exact algorithms.
• Computer Science
Math. Program.
• 2017
Numerical experiments indicate that the new SAG method often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.