• Corpus ID: 218581492

An Investigation of Why Overparameterization Exacerbates Spurious Correlations

  title={An Investigation of Why Overparameterization Exacerbates Spurious Correlations},
  author={Shiori Sagawa and Aditi Raghunathan and Pang Wei Koh and Percy Liang},
We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data. Through simulations and experiments on two image datasets, we identify two key properties of the training data that drive this behavior: the proportions of majority versus minority groups, and the signal-to-noise ratio of the spurious correlations. We then analyze a… 

Information-Theoretic Bias Reduction via Causal View of Spurious Correlation

A novel debiasing framework against the algorithmic bias is designed, which incorporates a bias regularization loss derived by the proposed information-theoretic bias measurement approach and is validated in diverse realistic scenarios.

Simple data balancing achieves competitive worst-group-accuracy

The results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters.

Identifying spurious correlations for robust text classification

This paper treats this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from “genuine” ones, and finds that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains.

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

This work focuses on providing a unified and efficient framework for Metadata Archaeology – uncovering and inferring metadata of examples in a dataset and is on par with far more sophisticated mitigation methods across different tasks.

Importance Tempering: Group Robustness for Overparameterized Models

Important tempering is proposed to improve the decision boundary and achieve consistently better results for overparameterized models and achieves state-of-the-art results on worst group classification tasks using importance tempering.

How does overparametrization affect performance on minority groups?

In a setting in which the regression functions for the majority and minority groups are different, it is shown that overparameterization always improves minority group performance.

Improved Worst-Group Robustness via Classifier Retraining on Independent Splits

This work develops a method, called CRIS, that improves upon state-of-the-art methods, such as Group DRO, on standard datasets while relying on much fewer group labels and little additional hyperparameter tuning.

Improving Out-of-Distribution Robustness via Selective Augmentation

The effectiveness of LISA is studied, it is shown that LISA consistently outperforms other state-of-the-art methods and leads to more invariant predictors, and a linear setting is analyzed to theoretically show how LISA leads to a smaller worst-group error.

Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

This paper aims to automatically identify spurious correlations in NLP models at scale by leveraging existing interpretability methods to extract tokens that significantly affect model’s decision process from the input text and identifying “genuine” and “spurious” tokens.

Distinguishing rule- and exemplar-based generalization in learning systems

It is shown that standard neural network models are feature-biased and exemplar-based, and the implications for machine learning research on systematic generalization, fairness, and data augmentation are discussed.



Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

Reconciling modern machine-learning practice and the classical bias–variance trade-off

This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.

A systematic study of the class imbalance problem in convolutional neural networks

Robust Solutions of Optimization Problems Affected by Uncertain Probabilities

The robust counterpart of a linear optimization problem with φ-divergence uncertainty is tractable for most of the choices of φ typically considered in the literature and extended to problems that are nonlinear in the optimization variables.

Distributionally Robust Language Modeling

An approach which trains a model that performs well over a wide range of potential test distributions, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

This work measures the bias and variance of neural networks and finds that deeper models decrease bias and increase variance for both in-dist distribution and out-of-distribution data, and corroborates these empirical results with a theoretical analysis of two-layer linear networks with random first layer.

Deep double descent: where bigger models and more data hurt

The notion of model complexity allows us to identify certain regimes where increasing the number of train samples actually hurts test performance, and defines a new complexity measure called the effective model complexity and conjecture a generalized double descent with respect to this measure.

The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve

Deep learning methods operate in regimes that defy the traditional statistical mindset, and are so rich that they can interpolate the observed labels, even if the latter are replaced by pure noise.

Natural Adversarial Examples

This work introduces two challenging datasets that reliably cause machine learning model performance to substantially degrade and curates an adversarial out-of-distribution detection dataset called IMAGENET-O, which is the first out- of-dist distribution detection dataset created for ImageNet models.

Benign overfitting in linear regression

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.