• Corpus ID: 220830740

Distributionally Robust Losses for Latent Covariate Mixtures

  title={Distributionally Robust Losses for Latent Covariate Mixtures},
  author={John C. Duchi and Tatsunori B. Hashimoto and Hongseok Namkoong},
While modern large-scale datasets often consist of heterogeneous subpopulations---for example, multiple demographic groups or multiple text corpora---the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically… 

Figures and Tables from this paper

Distributionally Robust Learning in Heterogeneous Contexts
A distributionally robust method that focuses on excess risks and achieves a more appropriate trade-off between performance and robustness than the conventional and overly conservative minimax approach is developed.
Evaluating model performance under worst-case subpopulations
A scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models, and a natural notion of model robustness that is easy to communicate with users, regulators, and business leaders is studied.
On Distributionally Robust Optimization and Data Rebalancing
Theoretical results are established that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset, and suggest that for each DRO problem there exists a data distribution such that learning this distribution is equivalent to solving theDRO problem.
Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality
A non-asymptotic framework for analyzing the out-of-sample performance for Wasserstein robust learning and the generalization bound for its related Lipschitz and gradient regularization problems is developed.
Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation
Theoretical results are established that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset and show that there is merit to both the algorithm-focused and the data-focused side of the bias debate.
Overparameterization Improves Robustness to Covariate Shift in High Dimensions
This work examines the exact high-dimensional asymptotics of random feature regression under covariate shift and presents a precise characterization of the limiting test error, bias, and variance in this setting, providing one of the first theoretical explanations for this ubiquitous empirical phenomenon.
Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
A method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance is given, and this approach is applied to a computer vision task, revealing sensitivity to shifts in non-causal attributes.
How does overparametrization affect performance on minority groups?
In a setting in which the regression functions for the majority and minority groups are different, it is shown that overparameterization always improves minority group performance.
Evaluating Model Robustness to Dataset Shift
A "debiased" estimator can be used to evaluate robustness and accounts for realistic shifts that cannot be expressed as covariate shift, and is derived which maintains $\sqrt{N}$-consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters.
Large-Scale Methods for Distributionally Robust Optimization
We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our


Learning Models with Uniform Performance via Distributionally Robust Optimization
A distributionally robust stochastic optimization framework that learns a model providing good performance against perturbations to the data-generating distribution is developed, and a convex formulation for the problem is given, providing several convergence guarantees.
Robust Covariate Shift Prediction with General Losses and Feature Views
By robustly minimizing various loss functions, including non-convex ones, under the testing distribution; and by separately shaping the influence of covariate shift according to different feature-based views of the relationship between input variables and example labels, these generalizations make robust covariateshift prediction applicable to more task scenarios.
Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
It is demonstrated that the distributionally robust optimization problems over Wasserstein balls can in fact be reformulated as finite convex programs—in many interesting cases even as tractable linear programs.
This paper shows rigorously that its framework encompasses adaptive regularization as a particular case, and demonstrates empirically that the proposed methodology is able to improve upon a wide range of popular machine learning estimators.
Distributionally Robust Logistic Regression
This paper uses the Wasserstein distance to construct a ball in the space of probability distributions centered at the uniform distribution on the training samples, and proposes a distributionally robust logistic regression model that minimizes a worst-case expected logloss function.
Wasserstein Distributional Robustness and Regularization in Statistical Learning
A broad class of loss functions are identified, for which the Wasserstein DRSO is asymptotically equivalent to a regularization problem with a gradient-norm penalty, which suggests a principled way to regularize high-dimensional, non-convex problems.
Robust Wasserstein profile inference and applications to machine learning
Wasserstein Profile Inference is introduced, a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case).
Confidence Intervals for Maximin Effects in Inhomogeneous Large-Scale Data
One challenge of large-scale data analysis is that the assumption of an identical distribution for all samples is often not realistic. An optimal linear regression might, for example, be markedly
Fairness Without Demographics in Repeated Loss Minimization
This paper develops an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution and proves that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice.
Robust Classification Under Sample Selection Bias
This work develops a framework for learning a robust bias-aware (RBA) probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation and demonstrates the behavior and effectiveness of the approach on binary classification tasks.