Corpus ID: 51833762

Generalization in Deep Networks: The Role of Distance from Initialization

@article{Nagarajan2019GeneralizationID,
  title={Generalization in Deep Networks: The Role of Distance from Initialization},
  author={Vaishnavh Nagarajan and J. Z. Kolter},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.01672}
}
  • Vaishnavh Nagarajan, J. Z. Kolter
  • Published 2019
  • Computer Science, Mathematics
  • ArXiv
  • Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact… CONTINUE READING
    36 Citations
    Uniform convergence may be unable to explain generalization in deep learning
    • 77
    • PDF
    Understanding training and generalization in deep learning by Fourier analysis
    • 34
    • PDF
    Understanding Generalization of Deep Neural Networks Trained with Noisy Labels
    • 10
    The intriguing role of module criticality in the generalization of deep networks
    • 10
    • PDF
    Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
    • 317
    • PDF
    Implicit Regularization in Over-parameterized Neural Networks
    • 10
    • PDF
    Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience
    • 39
    • PDF
    Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint
    • 24
    • PDF

    References

    SHOWING 1-10 OF 15 REFERENCES
    A Closer Look at Memorization in Deep Networks
    • 486
    • Highly Influential
    • PDF
    Understanding deep learning requires rethinking generalization
    • 2,419
    • PDF
    Understanding the difficulty of training deep feedforward neural networks
    • 9,450
    • PDF
    Train faster, generalize better: Stability of stochastic gradient descent
    • 552
    • PDF
    Sharp Minima Can Generalize For Deep Nets
    • 321
    • PDF
    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
    • 1,158
    • PDF
    Exploring Generalization in Deep Learning
    • 491
    • Highly Influential
    • PDF