# Distributionally Robust Models with Parametric Likelihood Ratios

@article{Michel2022DistributionallyRM,
title={Distributionally Robust Models with Parametric Likelihood Ratios},
author={Paul Michel and Tatsunori Hashimoto and Graham Neubig},
journal={ArXiv},
year={2022},
volume={abs/2204.06340}
}
• Published 13 April 2022
• Computer Science
• ArXiv
we three simple ideas – mini-batch level normalization, a KL penalty and simultaneous gradient updates – allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and classiﬁcation benchmarks, we the are consistently more to subpopulation shifts when to other and the performs with

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2022
This work proposes a simple yet practical framework, called uncertainty-aware mixup (UM IX), to mitigate the overﬁtting issue in over-parameterized models by reweighting the “mixed” samples according to the sample uncertainty.
• Computer Science
ArXiv
• 2022
Applying the distributionally robust optimization framework, a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions that can generate the training distribution under Γ-biased sampling is proposed, which is equivalent to an augmented convex risk minimization problem.
• Computer Science
ArXiv
• 2022
This work proposes a simple approach in which the distribution of uncertain parameters is approximated using a parametric family of distributions, which mitigates both sources of complexity but introduces a model misspeciﬁcation error.

## References

SHOWING 1-10 OF 51 REFERENCES

• Computer Science, Mathematics
ArXiv
• 2018
A distributionally robust stochastic optimization framework that learns a model providing good performance against perturbations to the data-generating distribution is developed, and a convex formulation for the problem is given, providing several convergence guarantees.
• Computer Science
ArXiv
• 2020
The authors propose a convex procedure that controls worst case subpopulation performance and provide finite-sample (nonparametric) convergence guarantees and observe significantly improved performance across unseen subpopulations.
• Computer Science
ICLR
• 2021
This paper proposes a relaxation of the KL-constrained inner maximization objective that makes the DRO problem more amenable to gradient-based optimization of large scale generative models, and develops model selection heuristics to guide hyper-parameter search.
• Computer Science
ArXiv
• 2019
The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
• Computer Science
NeurIPS
• 2020
We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our
• Computer Science
ICML
• 2021
This paper defines and analyzes robust and spurious representations using the information-theoretic concept of minimal sufficient statistics, and proves that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
• Computer Science
ICML
• 2016
A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
• Computer Science
ICML
• 2020
This work builds a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution and showcases the power of the proposed optimistic score ratios classifier on both synthetic and empirical data.
• Computer Science
EMNLP
• 2019
An approach which trains a model that performs well over a wide range of potential test distributions, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.
• Computer Science
Oper. Res.
• 2010
This paper proposes a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance matrix) and demonstrates that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently.