• Corpus ID: 239998364

Simple data balancing achieves competitive worst-group-accuracy

@article{Idrissi2021SimpleDB,
  title={Simple data balancing achieves competitive worst-group-accuracy},
  author={Badr Youbi Idrissi and Mart{\'i}n Arjovsky and Mohammad Pezeshki and David Lopez-Paz},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.14503}
}
We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. In… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 36 REFERENCES
Just Train Twice: Improving Group Robustness without Training Group Information
TLDR
This paper proposes a simple two-stage approach, JTT, that minimizes the loss over a reweighted dataset where the authors upweight training examples that are misclassified at the end of a few steps of standard training, leading to improved worst-group performance.
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
TLDR
The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
In Search of Lost Domain Generalization
TLDR
This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
Distributionally Robust Losses Against Mixture Covariate Shifts
Modern large-scale datasets are often collected over heterogeneous subpopulations, such as multiple demographic groups or multiple text corpora. Minimizing average loss over such datasets fails to
Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation
TLDR
Theoretical results are established that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset and show that there is merit to both the algorithm-focused and the data-focused side of the bias debate.
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the
No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems
TLDR
This work proposes GEORGE, a method to both measure and mitigate hidden stratification even when subclass labels are unknown, and theoretically characterize the performance of GEORGE in terms of the worst-case generalization error across any subclass.
What is the Effect of Importance Weighting in Deep Learning?
TLDR
The surprising finding that while importance weighting impacts models early in training, its effect diminishes over successive epochs is presented.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers
TLDR
This work proves that by interpolating the distributions of the correct predictions and the wrong predictions, an oracle distribution where the unstable correlation vanishes is uncovered, and uses group distributionally robust optimization to minimize the worst-case risk across all such interpolations.
...
1
2
3
4
...