# RSN: Randomized Subspace Newton

@inproceedings{Gower2019RSNRS, title={RSN: Randomized Subspace Newton}, author={Robert Mansel Gower and D. Kovalev and Felix Lieder and Peter Richt{\'a}rik}, booktitle={NeurIPS}, year={2019} }

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to…

## 31 Citations

Adaptive Newton Sketch: Linear-time Optimization with Quadratic Convergence and Effective Hessian Dimensionality

- Computer ScienceICML
- 2021

A randomized algorithm with quadratic convergence rate for convex optimization problems with a self-concordant, composite, strongly convex objective function based on performing an approximate Newton step using a random projection of the Hessian.

Zeroth-Order Randomized Subspace Newton Methods

- Computer Science, Mathematics
- 2022

The zeroth order randomized subspace Newton (ZO-RSN) method is proposed, which estimates projections of the gradient and Hessian by random sketching and finite differences, which allows us to compute the Newton step in a lower dimensional subspace, with small computational costs.

SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums

- Computer ScienceAISTATS
- 2022

This work develops a new Stochastic Average Newton method, which is incremental and cheap to implement when solving regularized generalized linear models and shows through extensive numerical experiments that SAN requires no knowledge about the problem, neither parameter tuning, while remaining competitive as compared to classical variance reduced gradient methods.

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

- Computer Science, MathematicsArXiv
- 2022

It is shown that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still enjoys a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

- Computer ScienceNeurIPS
- 2021

By introducing damped projection and adaptive regularization to the classical AM, a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems is proposed and the convergence theory of SAM is established.

SONIA: A Symmetric Blockwise Truncated Optimization Algorithm

- Computer ScienceAISTATS
- 2021

Theoretical results are presented to confirm that the algorithm converges to a stationary point in both the strongly convex and nonconvex cases, and a stochastic variant of the algorithm is also presented, along with corresponding theoretical guarantees.

Regularized Newton Method with Global O(1/k2) Convergence

- Mathematics, Computer ScienceArXiv
- 2021

A Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians is presented, and it is proved that locally the method converges superlinearly when the objective is strongly convex.

Stochastic Steepest Descent Methods for Linear Systems: Greedy Sampling & Momentum

- Computer ScienceArXiv
- 2020

The proposed greedy methods significantly outperform the existing methods for a wide variety of datasets such as random test instances as well as real-world datasets (LIBSVM, sparse datasets from matrix market collection).

NysADMM: faster composite convex optimization via low-rank approximation

- Computer Science
- 2022

The breadth of problems on which NysADMM beats standard solvers is a surprise and suggests that ADMM is a dominant paradigm for numerical optimization across a wide range of statistical learning problems that are usually solved with bespoke methods.

Adaptive and Oblivious Randomized Subspace Methods for High-Dimensional Optimization: Sharp Analysis and Lower Bounds

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2022

Experimental results show that the proposed approach enables significant speed ups in a wide variety of machine learning and optimization problems including logistic regression, kernel classification with random convolution layers and shallow neural networks with rectified linear units.

## References

SHOWING 1-10 OF 42 REFERENCES

Stochastic Block BFGS: Squeezing More Curvature out of Data

- Computer ScienceICML
- 2016

Numerical tests on large-scale logistic regression problems reveal that the proposed novel limited-memory stochastic block BFGS update is more robust and substantially outperforms current state-of-the-art methods.

Randomized Iterative Methods for Linear Systems

- Mathematics, Computer ScienceSIAM J. Matrix Anal. Appl.
- 2015

A novel, fundamental and surprisingly simple randomized iterative method for solving consistent linear systems, which allows for a much wider selection of these two parameters, which leads to a number of new specific methods.

Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence

- Computer Science, MathematicsSIAM J. Optim.
- 2017

A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.

SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization

- MathematicsICML
- 2016

Unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice.

Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares

- Computer ScienceJ. Mach. Learn. Res.
- 2016

This work provides a general lower bound on any randomized method that sketches both the data matrix and vector in a least-squares problem and presents a new method known as the iterative Hessian sketch, which can be used to obtain approximations to the original least- Squares problem using a projection dimension proportional to the statistical complexity of the least-Squares minimizer, and a logarithmic number of iterations.

A flexible coordinate descent method

- Computer ScienceComput. Optim. Appl.
- 2018

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature)…

On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning

- Computer ScienceSIAM J. Optim.
- 2011

Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration.

Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory

- Mathematics, Computer ScienceSIAM J. Matrix Anal. Appl.
- 2020

We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix…

Randomized Block Cubic Newton Method

- Computer Science, MathematicsICML
- 2018

RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases, and outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression.

Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems

- Computer Science, MathematicsSIAM J. Optim.
- 2012

Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.