Distributionally Robust Models with Parametric Likelihood Ratios

  title={Distributionally Robust Models with Parametric Likelihood Ratios},
  author={Paul Michel and Tatsunori Hashimoto and Graham Neubig},
we three simple ideas – mini-batch level normalization, a KL penalty and simultaneous gradient updates – allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and classification benchmarks, we the are consistently more to subpopulation shifts when to other and the performs with 

Figures and Tables from this paper

UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup

This work proposes a simple yet practical framework, called uncertainty-aware mixup (UM IX), to mitigate the overfitting issue in over-parameterized models by reweighting the “mixed” samples according to the sample uncertainty.

Learning from a Biased Sample

Applying the distributionally robust optimization framework, a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions that can generate the training distribution under Γ-biased sampling is proposed, which is equivalent to an augmented convex risk minimization problem.

Hedging against Complexity: Distributionally Robust Optimization with Parametric Approximation

This work proposes a simple approach in which the distribution of uncertain parameters is approximated using a parametric family of distributions, which mitigates both sources of complexity but introduces a model misspecification error.



Learning Models with Uniform Performance via Distributionally Robust Optimization

A distributionally robust stochastic optimization framework that learns a model providing good performance against perturbations to the data-generating distribution is developed, and a convex formulation for the problem is given, providing several convergence guarantees.

Distributionally Robust Losses for Latent Covariate Mixtures

The authors propose a convex procedure that controls worst case subpopulation performance and provide finite-sample (nonparametric) convergence guarantees and observe significantly improved performance across unseen subpopulations.

Modeling the Second Player in Distributionally Robust Optimization

This paper proposes a relaxation of the KL-constrained inner maximization objective that makes the DRO problem more amenable to gradient-based optimization of large scale generative models, and develops model selection heuristics to guide hyper-parameter search.

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

Large-Scale Methods for Distributionally Robust Optimization

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our

Examining and Combating Spurious Features under Distribution Shift

This paper defines and analyzes robust and spurious representations using the information-theoretic concept of minimal sufficient statistics, and proves that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.

Pixel Recurrent Neural Networks

A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.

Robust Bayesian Classification Using an Optimistic Score Ratio

This work builds a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution and showcases the power of the proposed optimistic score ratios classifier on both synthetic and empirical data.

Distributionally Robust Language Modeling

An approach which trains a model that performs well over a wide range of potential test distributions, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.

Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems

This paper proposes a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance matrix) and demonstrates that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently.