• Corpus ID: 166228562

RSN: Randomized Subspace Newton

@inproceedings{Gower2019RSNRS,
  title={RSN: Randomized Subspace Newton},
  author={Robert Mansel Gower and D. Kovalev and Felix Lieder and Peter Richt{\'a}rik},
  booktitle={NeurIPS},
  year={2019}
}
We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to… 

Figures and Tables from this paper

Adaptive Newton Sketch: Linear-time Optimization with Quadratic Convergence and Effective Hessian Dimensionality
TLDR
A randomized algorithm with quadratic convergence rate for convex optimization problems with a self-concordant, composite, strongly convex objective function based on performing an approximate Newton step using a random projection of the Hessian.
Zeroth-Order Randomized Subspace Newton Methods
TLDR
The zeroth order randomized subspace Newton (ZO-RSN) method is proposed, which estimates projections of the gradient and Hessian by random sketching and finite differences, which allows us to compute the Newton step in a lower dimensional subspace, with small computational costs.
SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums
TLDR
This work develops a new Stochastic Average Newton method, which is incremental and cheap to implement when solving regularized generalized linear models and shows through extensive numerical experiments that SAN requires no knowledge about the problem, neither parameter tuning, while remaining competitive as compared to classical variance reduced gradient methods.
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence
TLDR
It is shown that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still enjoys a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.
Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
TLDR
By introducing damped projection and adaptive regularization to the classical AM, a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems is proposed and the convergence theory of SAM is established.
SONIA: A Symmetric Blockwise Truncated Optimization Algorithm
TLDR
Theoretical results are presented to confirm that the algorithm converges to a stationary point in both the strongly convex and nonconvex cases, and a stochastic variant of the algorithm is also presented, along with corresponding theoretical guarantees.
Regularized Newton Method with Global O(1/k2) Convergence
TLDR
A Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians is presented, and it is proved that locally the method converges superlinearly when the objective is strongly convex.
Stochastic Steepest Descent Methods for Linear Systems: Greedy Sampling & Momentum
TLDR
The proposed greedy methods significantly outperform the existing methods for a wide variety of datasets such as random test instances as well as real-world datasets (LIBSVM, sparse datasets from matrix market collection).
NysADMM: faster composite convex optimization via low-rank approximation
TLDR
The breadth of problems on which NysADMM beats standard solvers is a surprise and suggests that ADMM is a dominant paradigm for numerical optimization across a wide range of statistical learning problems that are usually solved with bespoke methods.
Adaptive and Oblivious Randomized Subspace Methods for High-Dimensional Optimization: Sharp Analysis and Lower Bounds
TLDR
Experimental results show that the proposed approach enables significant speed ups in a wide variety of machine learning and optimization problems including logistic regression, kernel classification with random convolution layers and shallow neural networks with rectified linear units.
...
1
2
3
4
...

References

SHOWING 1-10 OF 42 REFERENCES
Stochastic Block BFGS: Squeezing More Curvature out of Data
TLDR
Numerical tests on large-scale logistic regression problems reveal that the proposed novel limited-memory stochastic block BFGS update is more robust and substantially outperforms current state-of-the-art methods.
Randomized Iterative Methods for Linear Systems
TLDR
A novel, fundamental and surprisingly simple randomized iterative method for solving consistent linear systems, which allows for a much wider selection of these two parameters, which leads to a number of new specific methods.
Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence
TLDR
A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.
SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization
TLDR
Unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice.
Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares
TLDR
This work provides a general lower bound on any randomized method that sketches both the data matrix and vector in a least-squares problem and presents a new method known as the iterative Hessian sketch, which can be used to obtain approximations to the original least- Squares problem using a projection dimension proportional to the statistical complexity of the least-Squares minimizer, and a logarithmic number of iterations.
A flexible coordinate descent method
We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature)
On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning
TLDR
Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration.
Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory
We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix
Randomized Block Cubic Newton Method
TLDR
RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases, and outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression.
Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems
  • Y. Nesterov
  • Computer Science, Mathematics
    SIAM J. Optim.
  • 2012
TLDR
Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.
...
1
2
3
4
5
...