Corpus ID: 51833762

Generalization in Deep Networks: The Role of Distance from Initialization

@article{Nagarajan2019GeneralizationID,
title={Generalization in Deep Networks: The Role of Distance from Initialization},
author={Vaishnavh Nagarajan and J. Z. Kolter},
journal={ArXiv},
year={2019},
volume={abs/1901.01672}
}
• Published 2019
• Computer Science, Mathematics
• ArXiv
• Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact… CONTINUE READING
36 Citations

References

SHOWING 1-10 OF 15 REFERENCES