Corpus ID: 9440787

Equilibrated adaptive learning rates for non-convex optimization

@inproceedings{Dauphin2015EquilibratedAL,
  title={Equilibrated adaptive learning rates for non-convex optimization},
  author={Yann Dauphin and H. D. Vries and Yoshua Bengio},
  booktitle={NIPS},
  year={2015}
}
  • Yann Dauphin, H. D. Vries, Yoshua Bengio
  • Published in NIPS 2015
  • Computer Science, Mathematics
  • Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes. We show that the popular Jacobi preconditioner… CONTINUE READING
    180 Citations
    Online Second Order Methods for Non-Convex Stochastic Optimizations
    • X. Li
    • Mathematics, Computer Science
    • 2018
    Improving Generalization Performance of Adaptive Learning Rate by Switching from Block Diagonal Matrix Preconditioning to SGD
    • Y. Ida, Y. Fujiwara
    • Computer Science
    • 2020 International Joint Conference on Neural Networks (IJCNN)
    • 2020
    Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
    • 9
    • Highly Influenced
    • PDF
    Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications
    Preconditioned Stochastic Gradient Descent
    • X. Li
    • Mathematics, Computer Science
    • IEEE Transactions on Neural Networks and Learning Systems
    • 2018
    • 35
    • PDF
    On the Performance of Preconditioned Stochastic Gradient Descent
    • X. Li
    • Computer Science, Mathematics
    • ArXiv
    • 2018
    An adaptive Hessian approximated stochastic gradient MCMC method
    On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
    • 99
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 31 REFERENCES
    Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
    • 6,217
    • Highly Influential
    • PDF
    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
    • 839
    • PDF
    Krylov Subspace Descent for Deep Learning
    • 101
    • PDF
    On the importance of initialization and momentum in deep learning
    • 2,582
    • PDF
    Deep learning via Hessian-free optimization
    • 710
    • Highly Influential
    • PDF
    Stochastic Spectral Descent for Restricted Boltzmann Machines
    • 29
    • PDF
    Revisiting Natural Gradient for Deep Networks
    • 211
    • PDF
    Estimating the Hessian by Back-propagating Curvature
    • 46
    • PDF
    Unit Tests for Stochastic Optimization
    • 65
    • PDF
    ADADELTA: An Adaptive Learning Rate Method
    • 4,351
    • PDF