On the distance between two neural networks and the stability of learning
@article{Bernstein2020OnTD, title={On the distance between two neural networks and the stability of learning}, author={J. Bernstein and Arash Vahdat and Yisong Yue and MingYu Liu}, journal={ArXiv}, year={2020}, volume={abs/2002.03432} }
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems not to require learning rate grid search, it may unlock a simpler workflow for training deeper and more complex neural networks. Please find the Python code used in this paper here: this https URL.
5 Citations
Learning compositional functions via multiplicative weight updates
- Computer Science, Mathematics
- NeurIPS
- 2020
- 5
- PDF
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
- Computer Science, Mathematics
- NeurIPS
- 2020
- 13
- Highly Influenced
- PDF
High-Performance Large-Scale Image Recognition Without Normalization
- Computer Science, Mathematics
- ArXiv
- 2021
- 1
- PDF
A Systematic Approach to Generating Accurate Neural Network Potentials: the Case of Carbon.
- Materials Science, Physics
- 2020
- PDF
References
SHOWING 1-10 OF 59 REFERENCES
On the difficulty of training recurrent neural networks
- Computer Science, Mathematics
- ICML
- 2013
- 3,000
- PDF
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
- Computer Science, Mathematics
- NIPS
- 2017
- 131
- Highly Influential
- PDF
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
- Computer Science, Mathematics
- ICLR
- 2018
- 272
- PDF
Path-SGD: Path-Normalized Optimization in Deep Neural Networks
- Computer Science, Mathematics
- NIPS
- 2015
- 172
- PDF
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
- Computer Science, Physics
- ICLR
- 2014
- 1,016
- PDF