Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
@article{Sagawa2019DistributionallyRN, title={Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization}, author={Shiori Sagawa and Pang Wei Koh and Tatsunori B. Hashimoto and Percy Liang}, journal={ArXiv}, year={2019}, volume={abs/1911.08731} }
Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can…
501 Citations
Leveraging Domain Relations for Domain Generalization
- Computer Science
- 2023
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on, and proposes a new approach called D^3G, which learns domain-specific models by leveraging the relations among different domains.
Explaining Visual Biases as Words by Generating Captions
- Computer ScienceArXiv
- 2023
B2T, a simple and intuitive scheme which generates captions of the mispredicted images using a pre-trained captioning model to extract the common keywords that may describe visual biases, can recover well-known gender and background biases, and discover novel ones in real-world datasets.
Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models
- Computer ScienceMedical Image Anal.
- 2023
Towards Scalable and Fast Distributionally Robust Optimization for Data-Driven Deep Learning
- Computer Science2022 IEEE International Conference on Data Mining (ICDM)
- 2022
Experimental results unveil that large parameterized models with the proposed method successfully adapt to uncertainty set whether the distribution contains out-of-domain or imbalanced property, and effectively achieves competitive performance and robustness.
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
- Computer ScienceArXiv
- 2022
Last Layer Ensemble is proposed, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior, surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems.
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling
- Computer ScienceArXiv
- 2022
The proposed FICS can successfully resolve the spurious correlation in generated samples on various datasets and design the fairness intervention for various degrees of supervision on the spurious attribute, including unsupervised, weakly- supervised, and semi-supervised scenarios.
Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation
- Computer ScienceArXiv
- 2022
This work suggests that tree-based ensemble models make anective baseline for tabular data, and are a sensible default when subgroup robustness is desired, even when compared to robustness- and fairness-enhancing methods.
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data
- Computer ScienceArXiv
- 2022
It is shown that Data-IQ’s characterization of examples is most robust to variation across similarly performant (yet different) models, compared to baselines, and that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection.
Evaluating the Impact of Geometric and Statistical Skews on Out-Of-Distribution Generalization
- Computer Science
- 2022
Out-of-distribution (OOD) or domain generalization is the problem of generalizing to unseen distributions that arises due to spurious correlations, which arise due to statistical and geometric skews.
RealPatch: A Statistical Matching Framework for Model Patching with Real Samples
- Computer ScienceECCV
- 2022
The proposed RealPatch framework performs model patching by aug-menting a dataset with real samples, mitigating the need to train generative models for the target task, and can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility.
References
SHOWING 1-10 OF 67 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Computer ScienceNAACL
- 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Robust Stochastic Approximation Approach to Stochastic Programming
- Computer Science, MathematicsSIAM J. Optim.
- 2009
It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
- Computer ScienceNAACL
- 2018
The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.
Understanding deep learning requires rethinking generalization
- Computer ScienceICLR
- 2017
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
Deep Learning Face Attributes in the Wild
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
The Caltech-UCSD Birds-200-2011 Dataset
- Computer Science
- 2011
This work introduces benchmarks and baseline experiments for multi-class categorization and part localization in CUB-200, a challenging dataset of 200 bird species and adds new part localization annotations.
Distributionally Robust Language Modeling
- Computer ScienceEMNLP
- 2019
An approach which trains a model that performs well over a wide range of potential test distributions, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.
Does Distributionally Robust Supervised Learning Give Robust Classifiers?
- Computer ScienceICML
- 2018
This paper proves that the DRSL just ends up giving a classifier that exactly fits the given training distribution, which is too pessimistic, and proposes simple D RSL that overcomes this pessimism and empirically demonstrate its effectiveness.
Annotation Artifacts in Natural Language Inference Data
- Computer ScienceNAACL
- 2018
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.