• Corpus ID: 225040271

In Search of Robust Measures of Generalization

  title={In Search of Robust Measures of Generalization},
  author={Gintare Karolina Dziugaite and Alexandre Drouin and Brady Neal and Nitarshan Rajkumar and Ethan Caballero and Linbo Wang and Ioannis Mitliagkas and Daniel M. Roy},
One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work… 

Figures from this paper

Generalization bounds for deep learning
Desiderata for techniques that predict generalization errors for deep learning models in supervised learning are introduced, and a marginal-likelihood PAC-Bayesian bound is derived that fulfills desiderata 1-3 and 5.
Why Flatness Correlates With Generalization For Deep Neural Networks
It is argued that local flatness measures correlate with generalization because they are local approximations to a global property, the volume of the set of parameters mapping to a specific function, equivalent to the Bayesian prior upon initialization.
Learning While Dissipating Information: Understanding the Generalization Capability of SGLD
An algorithm-dependent generalization bound is derived by analyzing SGLD through an information-theoretic lens and it is demonstrated that it can predict the behavior of the true generalization gap.
Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis
  • Aleksandar Vakanski, Min Xian
  • Computer Science
    2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2021
This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of deep learning classifiers for breast ultrasound images and indicates that PAC-Bayes flatness and path norm measures produce the most consistent explanation for the combination of models and data.
Ranking Deep Learning Generalization using Label Variation in Latent Geometry Graphs
This work proposes exploiting Latent Geometry Graphs (LGGs) to represent the latent spaces of trained DNN architectures by connecting samples that yield similar latent representations at a given layer of the considered DNN.
On Margins and Derandomisation in PAC-Bayes
This work gives a general recipe for derandomising PAC-Bayesian bounds using margins, extending to partially-derandomised predictors where only some of the randomness is removed, and extends bounds to cases where the concentration properties of the predictors are otherwise poor.
Methods for Estimating and Improving Robustness of Language Models
Diverse research directions providing estimations of model generalisation ability are surveyed and it is found that incorporating some of these measures in the training objectives leads to enhanced distributional robustness of neural models.
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
This study argues that the hypothesis set SGD explores is trajectory-dependent and thus may provide a tighter bound over its Rademacher complexity, and derives a novel generalization bound for deep neural networks.
Connecting Optimization and Generalization via Gradient Flow Path Length
A framework to connect optimization with generalization by analyzing the generalization error based on the length of optimization trajectory under the gradient algorithm after convergence shows that short optimization paths after convergence are associated with good generalization, which also matches the numerical results.
Understanding Generalization via Leave-One-Out Conditional Mutual Information
. We study the mutual information between (certain summaries of) the output of a learning algorithm and its n training data, conditional on a supersample of n + 1 i.i.d. data from which the training


Fantastic Generalization Measures and Where to Find Them
This work presents the first large scale study of generalization in deep networks, investigating more then 40 complexity measures taken from both theoretical bounds and empirical studies and showing surprising failures of some measures as well as promising measures for further research.
Uniform convergence may be unable to explain generalization in deep learning
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
Predicting the Generalization Gap in Deep Networks with Margin Distributions
This paper proposes a measure based on the concept of margin distribution, which are the distances of training points to the decision boundary, and finds that it is necessary to use margin distributions at multiple layers of a deep network.
A Surprising Linear Relationship Predicts Test Performance in Deep Networks
It is shown that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that have the same architecture, the same meta parameters and the same training error: one can either pretrain the networks with different levels of "corrupted" data or simply initialize the networkswith weights of different Gaussian standard deviations.
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
By optimizing the PAC-Bayes bound directly, Langford and Caruana (2001) are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples.
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
Invariant Models for Causal Transfer Learning
This work relaxes the usual covariate shift assumption and assumes that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks.
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks
A novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks and a matching lower bound for the Rademacher complexity that improves over previous capacity lower bounds for neural networks are presented.
Stronger generalization bounds for deep nets via a compression approach
These results provide some theoretical justification for widespread empirical success in compressing deep nets and show generalization bounds that're orders of magnitude better in practice.
Spectrally-normalized margin bounds for neural networks
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.