Corpus ID: 211069156

On the distance between two neural networks and the stability of learning

@article{Bernstein2020OnTD,
  title={On the distance between two neural networks and the stability of learning},
  author={J. Bernstein and Arash Vahdat and Yisong Yue and MingYu Liu},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.03432}
}
  • J. Bernstein, Arash Vahdat, +1 author MingYu Liu
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems not to require learning rate grid search, it may unlock a simpler workflow for training deeper and more complex neural networks. Please find the Python code used in this paper here: this https URL. 
    5 Citations

    Figures from this paper

    Learning by Turning: Neural Architecture Aware Optimisation
    • PDF
    Learning compositional functions via multiplicative weight updates
    • 5
    • PDF
    AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
    • 13
    • Highly Influenced
    • PDF
    High-Performance Large-Scale Image Recognition Without Normalization
    • 1
    • PDF

    References

    SHOWING 1-10 OF 59 REFERENCES
    On the difficulty of training recurrent neural networks
    • 3,000
    • PDF
    Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
    • 131
    • Highly Influential
    • PDF
    Optimization for deep learning: theory and algorithms
    • Ruoyu Sun
    • Computer Science, Mathematics
    • ArXiv
    • 2019
    • 42
    • PDF
    Measuring and regularizing networks in function space
    • 14
    • PDF
    No more pesky learning rates
    • 359
    • PDF
    Path-SGD: Path-Normalized Optimization in Deep Neural Networks
    • 172
    • PDF
    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
    • 1,016
    • PDF
    Geometry of Neural Network Loss Surfaces via Random Matrix Theory
    • 93
    • PDF