Corpus ID: 211259030

The Early Phase of Neural Network Training

@article{Frankle2020TheEP,
  title={The Early Phase of Neural Network Training},
  author={Jonathan Frankle and D. Schwab and Ari S. Morcos},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.10365}
}
Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of… Expand
44 Citations
FICIENT TRAINING OF DEEP NETWORKS
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
  • Stanislaw Jastrzebski, D. Arpit, +6 authors K. Geras
  • Computer Science, Mathematics
  • ArXiv
  • 2020
  • 1
  • PDF
Emerging Paradigms of Neural Network Pruning
  • 2
  • Highly Influenced
  • PDF
Understanding the Role of Training Regimes in Continual Learning
  • 14
  • PDF
Speeding-up pruning for Artificial Neural Networks: Introducing Accelerated Iterative Magnitude Pruning
  • 1
The large learning rate phase of deep learning: the catapult mechanism
  • 36
  • PDF
Roulette: A Pruning Framework to Train a Sparse Neural Network From Scratch
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Learning both Weights and Connections for Efficient Neural Network
  • 3,057
  • PDF
Understanding deep learning requires rethinking generalization
  • 2,606
  • PDF
Wide Residual Networks
  • 2,995
  • PDF
Why Does Unsupervised Pre-training Help Deep Learning?
  • 1,343
  • PDF
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  • 22,775
  • PDF
Rethinking the Value of Network Pruning
  • 470
  • PDF
Critical Learning Periods in Deep Neural Networks
  • 36
  • Highly Influential
  • PDF
Deep Residual Learning for Image Recognition
  • 65,751
  • Highly Influential
  • PDF
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
  • 178
  • PDF
...
1
2
3
...