• Corpus ID: 208176471

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

  title={Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization},
  author={Shiori Sagawa and Pang Wei Koh and Tatsunori B. Hashimoto and Percy Liang},
Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can… 

Figures and Tables from this paper

Leveraging Domain Relations for Domain Generalization

This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on, and proposes a new approach called D^3G, which learns domain-specific models by leveraging the relations among different domains.

Explaining Visual Biases as Words by Generating Captions

B2T, a simple and intuitive scheme which generates captions of the mispredicted images using a pre-trained captioning model to extract the common keywords that may describe visual biases, can recover well-known gender and background biases, and discover novel ones in real-world datasets.

Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models

Towards Scalable and Fast Distributionally Robust Optimization for Data-Driven Deep Learning

Experimental results unveil that large parameterized models with the proposed method successfully adapt to uncertainty set whether the distribution contains out-of-domain or imbalanced property, and effectively achieves competitive performance and robustness.

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

Last Layer Ensemble is proposed, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior, surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems.

Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling

The proposed FICS can successfully resolve the spurious correlation in generated samples on various datasets and design the fairness intervention for various degrees of supervision on the spurious attribute, including unsupervised, weakly- supervised, and semi-supervised scenarios.

Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation

This work suggests that tree-based ensemble models make anective baseline for tabular data, and are a sensible default when subgroup robustness is desired, even when compared to robustness- and fairness-enhancing methods.

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data

It is shown that Data-IQ’s characterization of examples is most robust to variation across similarly performant (yet different) models, compared to baselines, and that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection.

Evaluating the Impact of Geometric and Statistical Skews on Out-Of-Distribution Generalization

Out-of-distribution (OOD) or domain generalization is the problem of generalizing to unseen distributions that arises due to spurious correlations, which arise due to statistical and geometric skews.

RealPatch: A Statistical Matching Framework for Model Patching with Real Samples

The proposed RealPatch framework performs model patching by aug-menting a dataset with real samples, mitigating the need to train generative models for the target task, and can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility.



BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Robust Stochastic Approximation Approach to Stochastic Programming

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Deep Learning Face Attributes in the Wild

A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

The Caltech-UCSD Birds-200-2011 Dataset

This work introduces benchmarks and baseline experiments for multi-class categorization and part localization in CUB-200, a challenging dataset of 200 bird species and adds new part localization annotations.

Distributionally Robust Language Modeling

An approach which trains a model that performs well over a wide range of potential test distributions, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.

Does Distributionally Robust Supervised Learning Give Robust Classifiers?

This paper proves that the DRSL just ends up giving a classifier that exactly fits the given training distribution, which is too pessimistic, and proposes simple D RSL that overcomes this pessimism and empirically demonstrate its effectiveness.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.