Corpus ID: 6212000

Understanding deep learning requires rethinking generalization

@article{Zhang2017UnderstandingDL,
  title={Understanding deep learning requires rethinking generalization},
  author={Chiyuan Zhang and Samy Bengio and Moritz Hardt and B. Recht and Oriol Vinyals},
  journal={ArXiv},
  year={2017},
  volume={abs/1611.03530}
}
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. [...] Key ResultWe interpret our experimental findings by comparison with traditional models.Expand
Understanding deep learning (still) requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity. Expand
Deep Nets Don't Learn via Memorization
TLDR
It is established that there are qualitative differences when learning noise vs. natural datasets, and that for appropriately tuned explicit regularization, e.g. dropout, DNN training performance can be degraded on noise datasets without compromising generalization on real data. Expand
What can linearized neural networks actually say about generalization?
TLDR
It is found that during training, deep networks increase the alignment of their empirical NTK with the target task, which explains why linear approximations at the end of training can better explain the dynamics of deep networks. Expand
Uniform convergence may be unable to explain generalization in deep learning
TLDR
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. Expand
Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent
TLDR
A general analysis of the generalization performance as a function of data distribution and convolutional filter size is provided, given gradient descent as the optimization algorithm, and the results are interpreted using concrete examples. Expand
R OBUSTNESS TO PRUNING PREDICTS GENERALIZATION IN DEEP NEURAL NETWORKS
  • 2020
Why over-parameterized neural networks generalize as well as they do is a central concern of theoretical analysis in machine learning today. Following Occam’s razor, it has long been suggested thatExpand
Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications
TLDR
It is shown that pure SGD tends to converge to minimas that have better generalization performances in multiple natural language processing (NLP) tasks, and neural network's finite learning capability does not impact the intrinsic nature of SGD's implicit regularization effect. Expand
Refining the Structure of Neural Networks Using Matrix Conditioning
TLDR
This work proposes a practical method that employs matrix conditioning to automatically design the structure of layers of a feed-forward network, by first adjusting the proportion of neurons among theayers of a network and then scaling the size of network up or down. Expand
To understand deep learning we need to understand kernel learning
TLDR
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods. Expand
Sensitivity and Generalization in Neural Networks: an Empirical Study
TLDR
It is found that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Deep vs. shallow networks : An approximation theory perspective
TLDR
A new definition of relative dimension is proposed to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning. Expand
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmicallyExpand
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Convolutional Rectifier Networks as Generalized Tensor Decompositions
TLDR
Developing effective methods for training convolutional arithmetic circuits may give rise to a deep learning architecture that is provably superior to Convolutional rectifier networks, which has so far been overlooked by practitioners. Expand
On the Computational Efficiency of Training Neural Networks
TLDR
This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
...
1
2
3
4
5
...