Corpus ID: 15085443

Revisiting Natural Gradient for Deep Networks

@article{Pascanu2014RevisitingNG,
  title={Revisiting Natural Gradient for Deep Networks},
  author={Razvan Pascanu and Yoshua Bengio},
  journal={CoRR},
  year={2014},
  volume={abs/1301.3584}
}
  • Razvan Pascanu, Yoshua Bengio
  • Published 2014
  • Computer Science, Mathematics
  • CoRR
  • We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods for training deep models: Hessian-Free (Martens, 2010), Krylov Subspace Descent (Vinyals and Povey, 2012) and TONGA (Le Roux et al., 2008). We describe how one can use unlabeled data to improve the generalization error obtained by natural gradient and… CONTINUE READING
    218 Citations
    Natural Neural Networks
    • 128
    • PDF
    Block-diagonal Hessian-free Optimization for Training Neural Networks
    • 10
    • PDF
    Combining Natural Gradient with Hessian Free Methods for Sequence Training
    • 2
    • PDF
    Two-Level K-FAC Preconditioning for Deep Learning
    • PDF
    Training Neural Networks with Stochastic Hessian-Free Optimization
    • 37
    • PDF
    Understanding symmetries in deep networks
    • 16
    • PDF
    Sharp Minima Can Generalize For Deep Nets
    • 321
    • PDF
    Exact natural gradient in deep linear networks and its application to the nonlinear case
    • 22
    • Highly Influenced
    • PDF
    Continual Learning With Extended Kronecker-Factored Approximate Curvature
    • 2
    • PDF

    References

    SHOWING 1-10 OF 44 REFERENCES
    Krylov Subspace Descent for Deep Learning
    • 103
    • Highly Influential
    • PDF
    Training Neural Networks with Stochastic Hessian-Free Optimization
    • 37
    • PDF
    Learning Recurrent Neural Networks with Hessian-Free Optimization
    • 539
    • PDF
    Deep learning via Hessian-free optimization
    • 728
    • Highly Influential
    • PDF
    Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines
    • 29
    • PDF
    Natural conjugate gradient training of multilayer perceptrons
    • 12
    Why Does Unsupervised Pre-training Help Deep Learning?
    • 1,293
    • PDF
    A fast natural Newton method
    • 44
    • Highly Influential
    • PDF
    The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use
    • 7
    • PDF
    Topmoumoute Online Natural Gradient Algorithm
    • 161
    • PDF