Generalization in Deep Networks: The Role of Distance from Initialization
@article{Nagarajan2019GeneralizationID, title={Generalization in Deep Networks: The Role of Distance from Initialization}, author={Vaishnavh Nagarajan and J. Z. Kolter}, journal={ArXiv}, year={2019}, volume={abs/1901.01672} }
Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact… CONTINUE READING
36 Citations
Uniform convergence may be unable to explain generalization in deep learning
- Computer Science, Mathematics
- NeurIPS
- 2019
- 77
- PDF
Understanding training and generalization in deep learning by Fourier analysis
- Computer Science, Mathematics
- ArXiv
- 2018
- 34
- PDF
Understanding Generalization of Deep Neural Networks Trained with Noisy Labels
- Computer Science
- ArXiv
- 2019
- 10
The intriguing role of module criticality in the generalization of deep networks
- Computer Science, Mathematics
- ICLR
- 2020
- 10
- PDF
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
- Computer Science, Mathematics
- NeurIPS
- 2018
- 317
- PDF
Implicit Regularization in Over-parameterized Neural Networks
- Mathematics, Computer Science
- ArXiv
- 2019
- 10
- PDF
Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience
- Computer Science, Mathematics
- ICLR
- 2019
- 39
- PDF
Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee
- Computer Science, Mathematics
- ICLR
- 2020
- 32
- PDF
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
- Computer Science, Mathematics
- ICML
- 2020
- 43
- PDF
References
SHOWING 1-10 OF 15 REFERENCES
A Closer Look at Memorization in Deep Networks
- Computer Science, Mathematics
- ICML
- 2017
- 486
- Highly Influential
- PDF
Understanding the difficulty of training deep feedforward neural networks
- Computer Science, Mathematics
- AISTATS
- 2010
- 9,450
- PDF
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
- Computer Science, Mathematics
- NIPS
- 2017
- 355
- PDF
Train faster, generalize better: Stability of stochastic gradient descent
- Computer Science, Mathematics
- ICML
- 2016
- 552
- PDF
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
- Computer Science, Mathematics
- UAI
- 2017
- 298
- PDF
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
- Computer Science, Mathematics
- ICLR
- 2017
- 1,158
- PDF
Exploring Generalization in Deep Learning
- Computer Science, Mathematics
- NIPS
- 2017
- 491
- Highly Influential
- PDF
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
- Computer Science, Mathematics
- ICLR
- 2015
- 261
- PDF