Equilibrated adaptive learning rates for non-convex optimization
@inproceedings{Dauphin2015EquilibratedAL, title={Equilibrated adaptive learning rates for non-convex optimization}, author={Yann Dauphin and H. D. Vries and Yoshua Bengio}, booktitle={NIPS}, year={2015} }
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes. We show that the popular Jacobi preconditioner… CONTINUE READING
189 Citations
Online Second Order Methods for Non-Convex Stochastic Optimizations
- Mathematics, Computer Science
- 2018
- PDF
Improving Generalization Performance of Adaptive Learning Rate by Switching from Block Diagonal Matrix Preconditioning to SGD
- Computer Science
- 2020 International Joint Conference on Neural Networks (IJCNN)
- 2020
- PDF
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
- Computer Science, Mathematics
- AAAI
- 2016
- 161
- PDF
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
- Computer Science
- IJCAI
- 2017
- 10
- Highly Influenced
- PDF
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
- Computer Science, Mathematics
- ArXiv
- 2019
- 3
- PDF
Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
Preconditioned Stochastic Gradient Descent
- Mathematics, Computer Science
- IEEE Transactions on Neural Networks and Learning Systems
- 2018
- 37
- PDF
On the Performance of Preconditioned Stochastic Gradient Descent
- Computer Science, Mathematics
- ArXiv
- 2018
Scalable Adaptive Stochastic Optimization Using Random Projections
- Computer Science, Mathematics
- NIPS
- 2016
- 15
- PDF
An adaptive Hessian approximated stochastic gradient MCMC method
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 31 REFERENCES
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2011
- 6,444
- Highly Influential
- PDF
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
- Computer Science, Mathematics
- NIPS
- 2014
- 864
- PDF
Stochastic Spectral Descent for Restricted Boltzmann Machines
- Mathematics, Computer Science
- AISTATS
- 2015
- 29
- PDF