Distributionally Robust Losses for Latent Covariate Mixtures

  title={Distributionally Robust Losses for Latent Covariate Mixtures},
  author={John C. Duchi and Tatsunori B. Hashimoto and Hongseok Namkoong},
Reliable Machine Learning via Structured Distributionally Robust Optimization Data sets used to train machine learning (ML) models often suffer from sampling biases and underrepresent marginalized groups. Standard machine learning models are trained to optimize average performance and perform poorly on tail subpopulations. In “Distributionally Robust Losses for Latent Covariate Mixtures,” John Duchi, Tatsunori Hashimoto, and Hongseok Namkoong formulate a DRO approach for training ML models to… 

Figures and Tables from this paper

Blind Pareto Fairness and Subgroup Robustness

The proposed Blind Pareto Fairness (BPF) is a method that leverages no-regret dynamics to recover a fair minimax classifier that reduces worst-case risk of any potential subgroup of sufficient size, and guarantees that the remaining population receives the best possible level of service.

An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization

This work rigorously demonstrate that extrapolation is computationally much harder than interpolation, though their statistical complexity is not significantly different, and shows that ERM—possibly with added structured noise—is provably minimax-optimal for both tasks.

Higher-Order Expansion and Bartlett Correctability of Distributionally Robust Optimization

Distributionally robust optimization (DRO) is a worst-case framework for stochastic optimization under uncertainty that has drawn fast-growing studies in recent years. When the underlying probability

Complexity-Free Generalization via Distributionally Robust Optimization

This paper presents an alternate route to obtain generalization bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty.

Evaluating Model Robustness and Stability to Dataset Shift

A “debiased” estimator is derived which maintains p N -consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters, and in experiments on a real medical risk prediction task, this estimator can be used to analyze stability and accounts for realistic shifts that could not previously be expressed.

Learning from a Biased Sample

This work proposes a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions whose conditional distributions of outcomes Y given covariates X are determined from the conditional training distribution, and whose covariate distributions are absolutely continuous with respect to the covariate distribution of the training data.

Evaluating Model Robustness to Dataset Shift

A "debiased" estimator can be used to evaluate robustness and accounts for realistic shifts that cannot be expressed as covariate shift, and is derived which maintains $\sqrt{N}$-consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters.

Distributionally Robust Survival Analysis: A Novel Fairness Loss Without Demographics

We propose a new method for training survival analysis models that minimizes a worst-case error across all subpopulations that are large enough (occurring with at least a user-specified minimum

Improving Fairness Generalization Through a Sample-Robust Optimization Method

A new robustness framework for statistical fairness in machine learning inspired by the domain of Distributionally Robust Optimization is proposed and works in ensuring fairness over a variety of samplings of the training set and effectively improves fairness generalization.

Long Term Fairness for Minority Groups via Performative Distributionally Robust Optimization

Four key shortcomings of these formal fairness criteria are identified, and ex-tending performative prediction to include a distributionally robust objective is aimed to help to address them.




This paper shows rigorously that its framework encompasses adaptive regularization as a particular case, and demonstrates empirically that the proposed methodology is able to improve upon a wide range of popular machine learning estimators.

Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

This tutorial argues that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.

Combating Conservativeness in Data-Driven Optimization under Uncertainty: A Solution Path Approach

This paper investigates a validation-based strategy to avoid set estimation by exploiting the intrinsic low dimensionality among all possible solutions output from a given reformulation, and achieves asymptotically optimal solutions, regarded as the least conservative with respect to the considered reformulation classes.

Distributionally Robust Optimization and Generalization in Kernel Methods

It is shown that MMD DRO is roughly equivalent to regularization by the Hilbert norm and, as a byproduct, reveal deep connections to classic results in statistical learning.

Estimation of Integral Functionals of a Density

Let φ be a smooth function of k + 2 variables. We shall investigate in this paper the rates of convergence of estimators of T(f)= fφ(f(x), f'(x),..., f (k) (x), x) dx when f belongs to some class of

Covariate Shift by Kernel Mean Matching

This chapter contains sections titled: Introduction, Sample Reweighting, Distribution Matching, Risk Estimates, The Connection to Single Class Support Vector Machines, Experiments, Conclusion,

Robust Wasserstein profile inference and applications to machine learning

Wasserstein Profile Inference is introduced, a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case).

Weak Convergence and Empirical Processes: With Applications to Statistics

This chapter discusses Convergence: Weak, Almost Uniform, and in Probability, which focuses on the part of Convergence of the Donsker Property which is concerned with Uniformity and Metrization.

Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved

This paper decomposes the biases in estimating outcome disparity via threshold-based imputation into multiple interpretable bias sources, allowing us to explain when over- or underestimation occurs and proposes an alternative weighted estimator that uses soft classification.