# Understanding the Failure Modes of Out-of-Distribution Generalization

@article{Nagarajan2020UnderstandingTF, title={Understanding the Failure Modes of Out-of-Distribution Generalization}, author={Vaishnavh Nagarajan and Anders Andreassen and Behnam Neyshabur}, journal={ArXiv}, year={2020}, volume={abs/2010.15775} }

Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained…

## 85 Citations

### Model-Based Domain Generalization

- Computer ScienceNeurIPS
- 2021

This paper proposes a novel approach for the domain generalization problem called Model-Based Domain Generalization, which uses unlabeled data from the training domains to learn multi-modal domain transformation models that map data from one training domain to any other domain.

### Contemplating real-world object classification

- Computer ScienceICLR
- 2021

The results indicate that limiting the object area as much as possible leads to consistent improvement in accuracy and robustness, and show that ObjecNet is still a challenging test platform for evaluating the generalization ability of models.

### Does Invariant Risk Minimization Capture Invariance?

- MathematicsAISTATS
- 2021

It is shown that the Invariant Risk Minimization (IRM) formulation can fail to capture “natural” invariances, at least when used in its practical “linear” form, and even on very simple problems which directly follow the motivating examples for IRM.

### I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch

- Computer ScienceArXiv
- 2020

This work compares commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids, suggesting significant progress can be made in self-supervised audio learning by improving current losses.

### Efficient debiasing with contrastive weight pruning

- Computer ScienceArXiv
- 2022

Neural networks are often biased to spuriously correlated features that provide misleading statistical evidence that does not generalize. This raises a fundamental question: “Does an optimal unbiased…

### Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

- Computer ScienceICML
- 2021

A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks.

### Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors

- Computer ScienceArXiv
- 2022

It is empirically shown that sample-wise AT has limited improvement on OOD performance, and two AT variants with low-rank structures to train OOD-robust models are proposed, which provide clues that adversarial perturbations with universal structures can enhance the robustness against large data distribution shifts that are common in OOD scenarios.

### Evaluating and Improving Robustness of Self-Supervised Representations to Spurious Correlations

- Computer Science
- 2022

This work shows that classical approaches in combating spurious correlations, such as dataset re-sampling during SSL, do not consistently lead to invariant representation and proposes a method to remove spurious information from representations during pre-training, by pruning or re-initializing later layers of the encoder.

### Towards out of distribution generalization for problems in mechanics

- Computer ScienceComputer Methods in Applied Mechanics and Engineering
- 2022

### Improving Multi-Task Generalization via Regularizing Spurious Correlation

- Computer ScienceArXiv
- 2022

Experiments show that MT-CRL could enhance MTL model’s performance by 5.5 % on average over Multi-MNIST, MovieLens, Taskonomy, CityScape, and NYUv2, and show it could indeed alleviate spurious correlation problem.

## References

SHOWING 1-10 OF 47 REFERENCES

### Uniform convergence may be unable to explain generalization in deep learning

- Computer ScienceNeurIPS
- 2019

Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.

### Out-of-Distribution Generalization via Risk Extrapolation (REx)

- Computer ScienceICML
- 2021

This work introduces the principle of Risk Extrapolation (REx), and shows conceptually how this principle enables extrapolation, and demonstrates the effectiveness and scalability of instantiations of REx on various OoD generalization tasks.

### An Investigation of Why Overparameterization Exacerbates Spurious Correlations

- Computer ScienceICML
- 2020

The analysis leads to a counterintuitive approach of subsampling the majority group, which empirically achieves low minority error in the overparameterized regime, even though the standard approach of upweighting the minority fails.

### Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

- Computer ScienceArXiv
- 2019

The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

### In Search of Lost Domain Generalization

- Computer ScienceICLR
- 2021

This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.

### Adversarial Spheres

- Computer ScienceICLR
- 2018

A fundamental tradeoff between the amount of test error and the average distance to nearest error is shown, which proves that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/\sqrt{d})$.

### Generalization in Deep Networks: The Role of Distance from Initialization

- Computer ScienceArXiv
- 2019

Empirical evidences are provided that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of the distance from initialization, and theoretical arguments that further highlight the need for initialization-dependent notions of model capacity are highlighted.

### The Implicit Bias of Gradient Descent on Separable Data

- Computer ScienceJ. Mach. Learn. Res.
- 2018

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the…

### Towards Shape Biased Unsupervised Representation Learning for Domain Generalization

- Computer ScienceArXiv
- 2019

This work proposes a learning framework to improve the shape bias property of self-supervised methods by integrating domain diversification and jigsaw puzzles and shows that this framework outperforms state-of-the-art domain generalization methods by a large margin.

### mixup: Beyond Empirical Risk Minimization

- Computer ScienceICLR
- 2018

This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.