Corpus ID: 6318468

Krylov Subspace Descent for Deep Learning

  title={Krylov Subspace Descent for Deep Learning},
  author={Oriol Vinyals and D. Povey},
  • Oriol Vinyals, D. Povey
  • Published in AISTATS 2012
  • Computer Science, Mathematics
  • In this paper, we propose a second order optimization method to learn models where both the dimensionality of the parameter space and the number of training samples is high. In our method, we construct on each iteration a Krylov subspace formed by the gradient and an approximation to the Hessian matrix, and then use a subset of the training data samples to optimize over this subspace. As with the Hessian Free (HF) method of Martens (2010), the Hessian matrix is never explicitly constructed, and… CONTINUE READING
    102 Citations
    Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling
    • 20
    • PDF
    Practical Quasi-Newton Methods for Training Deep Neural Networks
    • 3
    • PDF
    Revisiting Natural Gradient for Deep Networks
    • 216
    • Highly Influenced
    • PDF
    Distributed Hessian-Free Optimization for Deep Neural Network
    • 10
    • PDF
    Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
    • 9
    • PDF
    Large Scale Distributed Hessian-Free Optimization for Deep Neural Network
    • 9
    • PDF
    A Kronecker-factored approximate Fisher matrix for convolution layers
    • 81
    • Highly Influenced
    • PDF
    Optimizing Neural Networks with Kronecker-factored Approximate Curvature
    • 381
    • PDF


    Deep learning via Hessian-free optimization
    • 722
    • Highly Influential
    • PDF
    On optimization methods for deep learning
    • 768
    • PDF
    Fast Exact Multiplication by the Hessian
    • 482
    • PDF
    On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning
    • 189
    • PDF
    Enriched Methods for Large-Scale Unconstrained Optimization
    • 42
    • PDF
    Understanding the difficulty of training deep feedforward neural networks
    • 9,251
    • PDF
    Natural Gradient Works Efficiently in Learning
    • 2,464