• Corpus ID: 225103201

Understanding the Failure Modes of Out-of-Distribution Generalization

@article{Nagarajan2020UnderstandingTF,
title={Understanding the Failure Modes of Out-of-Distribution Generalization},
author={Vaishnavh Nagarajan and Anders Andreassen and Behnam Neyshabur},
journal={ArXiv},
year={2020},
volume={abs/2010.15775}
}
• Published 29 October 2020
• Computer Science
• ArXiv
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained…
• Computer Science
NeurIPS
• 2021
This paper proposes a novel approach for the domain generalization problem called Model-Based Domain Generalization, which uses unlabeled data from the training domains to learn multi-modal domain transformation models that map data from one training domain to any other domain.
The results indicate that limiting the object area as much as possible leads to consistent improvement in accuracy and robustness, and show that ObjecNet is still a challenging test platform for evaluating the generalization ability of models.
• Mathematics
AISTATS
• 2021
It is shown that the Invariant Risk Minimization (IRM) formulation can fail to capture “natural” invariances, at least when used in its practical “linear” form, and even on very simple problems which directly follow the motivating examples for IRM.
• Computer Science
ArXiv
• 2020
This work compares commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids, suggesting significant progress can be made in self-supervised audio learning by improving current losses.
• Computer Science
ArXiv
• 2022
Neural networks are often biased to spuriously correlated features that provide misleading statistical evidence that does not generalize. This raises a fundamental question: “Does an optimal unbiased
• Computer Science
ICML
• 2021
A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks.
• Computer Science
ArXiv
• 2022
It is empirically shown that sample-wise AT has limited improvement on OOD performance, and two AT variants with low-rank structures to train OOD-robust models are proposed, which provide clues that adversarial perturbations with universal structures can enhance the robustness against large data distribution shifts that are common in OOD scenarios.
• Computer Science
• 2022
This work shows that classical approaches in combating spurious correlations, such as dataset re-sampling during SSL, do not consistently lead to invariant representation and proposes a method to remove spurious information from representations during pre-training, by pruning or re-initializing later layers of the encoder.
• Computer Science
Computer Methods in Applied Mechanics and Engineering
• 2022
• Computer Science
ArXiv
• 2022
Experiments show that MT-CRL could enhance MTL model’s performance by 5.5 % on average over Multi-MNIST, MovieLens, Taskonomy, CityScape, and NYUv2, and show it could indeed alleviate spurious correlation problem.

References

SHOWING 1-10 OF 47 REFERENCES

• Computer Science
NeurIPS
• 2019
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
• Computer Science
ICML
• 2021
This work introduces the principle of Risk Extrapolation (REx), and shows conceptually how this principle enables extrapolation, and demonstrates the effectiveness and scalability of instantiations of REx on various OoD generalization tasks.
• Computer Science
ICML
• 2020
The analysis leads to a counterintuitive approach of subsampling the majority group, which empirically achieves low minority error in the overparameterized regime, even though the standard approach of upweighting the minority fails.
• Computer Science
ArXiv
• 2019
The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
• Computer Science
ICLR
• 2021
This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
• Computer Science
ICLR
• 2018
A fundamental tradeoff between the amount of test error and the average distance to nearest error is shown, which proves that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/\sqrt{d})\$.
• Computer Science
ArXiv
• 2019
Empirical evidences are provided that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of the distance from initialization, and theoretical arguments that further highlight the need for initialization-dependent notions of model capacity are highlighted.
• Computer Science
J. Mach. Learn. Res.
• 2018
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the
• Computer Science
ArXiv
• 2019
This work proposes a learning framework to improve the shape bias property of self-supervised methods by integrating domain diversification and jigsaw puzzles and shows that this framework outperforms state-of-the-art domain generalization methods by a large margin.
• Computer Science
ICLR
• 2018
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.