SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums

  title={SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums},
  author={Jiabin Chen and Rui Yuan and Guillaume Garrigos and Robert Mansel Gower},
We present a principled approach for designing stochastic Newton methods for solving finite sum optimization problems. Our approach has two steps. First, we rewrite the stationarity conditions as a system of nonlinear equations that associates each data point to a new row. Second, we apply a subsampled Newton Raphson method to solve this system of nonlinear equations. By design, methods developed using our approach are incremental, in that they require only a single data point per iteration… 
1 Citations

Figures and Tables from this paper

SP2: A Second Order Stochastic Polyak Method

This work develops a method for solving the interpolation equations that uses the local second-order approximation of the model, and uses Hessian-vector products to speed-up the convergence of SP.



A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums

A new incremental method whose convergence rate is superlinear - the Newton-type incremental method (NIM), which is to introduce a model of the objective with the same sum-of-functions structure and further update a single component of the model per iteration.

Sketched Newton-Raphson

By showing that SNR can be interpreted as a variant of the stochastic gradient descent (SGD) method, this theory is able to leverage proof techniques of SGD and establish a global convergence theory and rates of convergence for SNR.

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

This work presents two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions and establishes local linear-quadratic convergence results.

Exact and Inexact Subsampled Newton Methods for Optimization

This paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact).

Greedy Quasi-Newton Methods with Explicit Superlinear Convergence

This paper establishes an explicit non-asymptotic bound on their rate of local superlinear convergence, which contains a contraction factor, depending on the square of the iteration counter, and shows that greedy quasi-Newton methods produce Hessian approximations whose deviation from the exact Hessians linearly convergences to zero.

Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.

IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate

IQN is the first stochastic quasi-Newton method proven to converge superlinearly in a local neighborhood of the optimal solution and establishes its local superlinear convergence rate.

Sub-sampled Newton methods

For large-scale finite-sum minimization problems, we study non-asymptotic and high-probability global as well as local convergence properties of variants of Newton’s method where the Hessian and/or

SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization

Unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice.

Convergence rates of sub-sampled Newton methods

This paper uses sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost.