• Corpus ID: 166228562

RSN: Randomized Subspace Newton

  title={RSN: Randomized Subspace Newton},
  author={Robert Mansel Gower and D. Kovalev and Felix Lieder and Peter Richt{\'a}rik},
We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to… 

Figures and Tables from this paper

Adaptive Newton Sketch: Linear-time Optimization with Quadratic Convergence and Effective Hessian Dimensionality
A randomized algorithm with quadratic convergence rate for convex optimization problems with a self-concordant, composite, strongly convex objective function based on performing an approximate Newton step using a random projection of the Hessian.
Zeroth-Order Randomized Subspace Newton Methods
The zeroth order randomized subspace Newton (ZO-RSN) method is proposed, which estimates projections of the gradient and Hessian by random sketching and finite differences, which allows us to compute the Newton step in a lower dimensional subspace, with small computational costs.
SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums
This work develops a new Stochastic Average Newton method, which is incremental and cheap to implement when solving regularized generalized linear models and shows through extensive numerical experiments that SAN requires no knowledge about the problem, neither parameter tuning, while remaining competitive as compared to classical variance reduced gradient methods.
Precise expressions for random projections: Low-rank approximation and randomized Newton
This work exploits recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching, and enables precise analysis of these methods in terms of spectral properties of the data.
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence
It is shown that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still enjoys a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.
Sketched Newton-Raphson
By showing that SNR can be interpreted as a variant of the stochastic gradient descent (SGD) method, this theory is able to leverage proof techniques of SGD and establish a global convergence theory and rates of convergence for SNR.
Learning-Augmented Sketches for Hessians
It is shown empirically that learned sketches, compared with their "non-learned" counterparts, improve the approximation accuracy for a large number of important problems, including LASSO, SVM, and matrix estimation with nuclear norm constraints.
Scalable subspace methods for derivative-free nonlinear least-squares optimization
A probabilistic worst-case complexity analysis is presented for this general framework for large-scale model-based derivative-free optimization based on iterative minimization within random subspaces, and high-probability bounds on the number of iterations are proved.
SONIA: A Symmetric Blockwise Truncated Optimization Algorithm
Theoretical results are presented to confirm that the algorithm converges to a stationary point in both the strongly convex and nonconvex cases, and a stochastic variant of the algorithm is also presented, along with corresponding theoretical guarantees.
Regularized Newton Method with Global O(1/k2) Convergence
A Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians is presented, and it is proved that locally the method converges superlinearly when the objective is strongly convex.


Stochastic Block BFGS: Squeezing More Curvature out of Data
Numerical tests on large-scale logistic regression problems reveal that the proposed novel limited-memory stochastic block BFGS update is more robust and substantially outperforms current state-of-the-art methods.
Randomized Iterative Methods for Linear Systems
A novel, fundamental and surprisingly simple randomized iterative method for solving consistent linear systems, which allows for a much wider selection of these two parameters, which leads to a number of new specific methods.
Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence
A randomized second-order method for optimization known as the Newton Sketch, based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian, is proposed, which has super-linear convergence with exponentially high probability and convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities.
SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization
Unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice.
Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares
This work provides a general lower bound on any randomized method that sketches both the data matrix and vector in a least-squares problem and presents a new method known as the iterative Hessian sketch, which can be used to obtain approximations to the original least- Squares problem using a projection dimension proportional to the statistical complexity of the least-Squares minimizer, and a logarithmic number of iterations.
A flexible coordinate descent method
We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature)
On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning
Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration.
Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory
We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix
Randomized Block Cubic Newton Method
RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases, and outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression.
Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems
  • Y. Nesterov
  • Computer Science, Mathematics
    SIAM J. Optim.
  • 2012
Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.