EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks
@inproceedings{Chen2019EACGAA, title={EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks}, author={Sheng-Wei Chen and Chun-Nan Chou and Edward Y. Chang}, booktitle={AAAI}, year={2019} }
For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which…
Figures and Tables from this paper
3 Citations
Deep Residual Partitioning
- Computer Science
- 2020
This work introduces residual partitioning, a novel second-order optimization method for training neural nets that converges to a competitive or better solution on several machine learning tasks.
Laplace Approximation for Uncertainty Estimation of Deep Neural Networks
- Computer Science
- 2019
The most popular deep neural network architectures are compared based on their compliance to uncertainty estimation by Laplace approximation, assessing empirically the methods potentials and deficiencies as well as its applicability to large models and datasets while working towards an understanding how architectural choices correlate with the quality of obtained uncertainty estimates.
Tractable structured natural gradient descent using local parameterizations
- Computer ScienceICML
- 2021
This work generalizes the exponential natural evolutionary strategy, recovers existing Newton-like algorithms, yields new structured second-order algorithms, and gives new algorithms to learn covariances of Gaussian and Wishart-based distributions.
References
SHOWING 1-10 OF 34 REFERENCES
Block-diagonal Hessian-free Optimization for Training Neural Networks
- Computer ScienceArXiv
- 2017
Experiments on deep autoencoders, deep convolutional networks, and multilayer LSTMs demonstrate better convergence and generalization compared to the original Hessian-free approach and the Adam method.
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
- Computer ScienceICML
- 2015
K-FAC is an efficient method for approximating natural gradient descent in neural networks which is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse.
Practical Gauss-Newton Optimisation for Deep Learning
- Computer ScienceICML
- 2017
A side result of this work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.
Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature
- Computer ScienceNeural Networks
- 2008
A Kronecker-factored approximate Fisher matrix for convolution layers
- Computer ScienceICML
- 2016
Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the…
Deep learning via Hessian-free optimization
- Computer ScienceICML
- 2010
A 2nd-order optimization method based on the "Hessian-free" approach is developed, and applied to training deep auto-encoders, and results superior to those reported by Hinton & Salakhutdinov (2006) are obtained.
Fast Exact Multiplication by the Hessian
- Computer ScienceNeural Computation
- 1994
This work derives a technique that directly calculates Hv, where v is an arbitrary vector, and shows that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.
Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
- Computer ScienceNeural Computation
- 2002
We propose a generic method for iteratively approximating various second-order gradient steps-Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient-in linear time per iteration, using…
Understanding the difficulty of training deep feedforward neural networks
- Computer ScienceAISTATS
- 2010
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons
- Computer ScienceNeural Computation
- 2000
An adaptive method of directly obtaining the inverse of the Fisher information matrix is proposed and it generalizes the adaptive Gauss-Newton algorithms and provides a solid theoretical justification of them.