If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks

@article{Pretorius2019IfDL,
  title={If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks},
  author={Arnu Pretorius and Elan Van Biljon and B. Niekerk and Ryan Eloff and Matthew Reynard and S. James and Benjamin Rosman and H. Kamper and S. Kroon},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.05725}
}
  • Arnu Pretorius, Elan Van Biljon, +6 authors S. Kroon
  • Published 2019
  • Mathematics, Computer Science
  • ArXiv
  • Recent work in signal propagation theory has shown that dropout limits the depth to which information can propagate through a neural network. In this paper, we investigate the effect of initialisation on training speed and generalisation for ReLU networks within this depth limit. We ask the following research question: given that critical initialisation is crucial for training at large depth, if dropout limits the depth at which networks are trainable, does initialising critically still matter… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 33 REFERENCES
    Dropout: a simple way to prevent neural networks from overfitting
    • 19,103
    • PDF
    Understanding the difficulty of training deep feedforward neural networks
    • 8,543
    • Highly Influential
    • PDF
    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
    • 8,034
    • Highly Influential
    • PDF
    Learning Multiple Layers of Features from Tiny Images
    • 9,107
    • Highly Influential
    • PDF
    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
    • 2,241
    • PDF
    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
    • 919
    • Highly Influential
    • PDF
    Deep Information Propagation
    • 148
    • Highly Influential
    • PDF
    Variational Dropout and the Local Reparameterization Trick
    • 547
    Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
    • 122
    • PDF
    Regularization of Neural Networks using DropConnect
    • 1,698
    • PDF