# Robust Validation: Confident Predictions Even When Distributions Shift

@article{Cauchois2020RobustVC, title={Robust Validation: Confident Predictions Even When Distributions Shift}, author={Maxime Cauchois and Suyash Gupta and Alnur Ali and John C. Duchi}, journal={ArXiv}, year={2020}, volume={abs/2008.04267} }

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present…

## 19 Citations

Adaptive Conformal Inference Under Distribution Shift

- Computer Science, Mathematics
- 2021

This work develops methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion and achieves the desired coverage frequency over long-time intervals irrespective of the true data generating process.

Understanding the Under-Coverage Bias in Uncertainty Estimation

- Computer Science
- 2021

It is proved that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where the authors learn a realizable linear quantile function and there is more data than parameters.

Understanding the Under-Coverage Bias in Uncertainty Estimation

- Computer Science

It is proved that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where the authors learn a realizable linear quantile function and there is more data than parameters.

Understanding the Under-Coverage Bias in Uncertainty Estimation

- Computer ScienceNeurIPS
- 2021

It is proved that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where the authors learn a realizable linear quantile function and there is more data than parameters.

Private Prediction Sets

- Computer ScienceArXiv
- 2021

This work develops a method that takes any pre-trained predictive model and outputs differentially private prediction sets, and follows the general approach of split conformal prediction; it uses holdout data to calibrate the size of the prediction sets but preserves privacy by using a privatized quantile subroutine.

Distribution-Free, Risk-Controlling Prediction Sets

- Computer ScienceJ. ACM
- 2021

This work shows how to generate set-valued predictions from a black-box predictor that controls the expected loss on future test points at a user-specified level, and provides explicit finite-sample guarantees for any dataset by using a holdout set to calibrate the size of the prediction sets.

Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

- Computer Science
- 2021

A model-free framework for sensitivity analysis of individual treatment eﬀects (ITEs), building upon ideas from conformal inference, and proves a sharpness result showing that for certain classes of prediction problems, the prediction intervals cannot possibly be tightened.

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

- Computer ScienceArXiv
- 2021

This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ, allowing them to put confidence intervals on their algorithms, with one self-contained document.

Conformal prediction for the design problem

- Computer ScienceArXiv
- 2022

This work introduces a method to quantify predictive uncertainty in real-world deployments of machine learning by enabling uncertainty quantification with finite-sample statistical guarantees when the training and test data exhibit a type of dependence that the authors call feedback covariate shift (FCS).

Efficient and Differentiable Conformal Prediction with General Function Classes

- Computer ScienceArXiv
- 2022

A generalization of conformal prediction to multiple learnable parameters is proposed by considering the constrained empirical risk minimization (ERM) problem of finding the most efficient prediction set subject to valid empirical coverage, and it achieves approximate valid population coverage and near-optimal efficiency within class.

## References

SHOWING 1-10 OF 43 REFERENCES

Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification

- Computer ScienceICML
- 2014

This paper describes the problem of learning under changing distributions as a game between a learner and an adversary, and provides an algorithm, robust covariate shift adjustment (RCSA), that provides relevant weights.

Robust Wasserstein profile inference and applications to machine learning

- Computer ScienceJ. Appl. Probab.
- 2019

Wasserstein Profile Inference is introduced, a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case).

Distribution-Free Predictive Inference for Regression

- Computer Science, MathematicsJournal of the American Statistical Association
- 2018

A general framework for distribution-free predictive inference in regression, using conformal inference, which allows for the construction of a prediction band for the response variable using any estimator of the regression function, and a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference.

Knowing what you know: valid confidence sets in multiclass and multilabel prediction

- Computer ScienceArXiv
- 2020

To address the potential challenge of exponentially large confidence sets in multilabel prediction, this work builds tree-structured classifiers that efficiently account for interactions between labels that can be bolted on top of any classification model to guarantee its validity.

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

- Computer ScienceMath. Program.
- 2018

It is demonstrated that the distributionally robust optimization problems over Wasserstein balls can in fact be reformulated as finite convex programs—in many interesting cases even as tractable linear programs.

Learning Models with Uniform Performance via Distributionally Robust Optimization

- Computer Science, MathematicsArXiv
- 2018

A distributionally robust stochastic optimization framework that learns a model providing good performance against perturbations to the data-generating distribution is developed, and a convex formulation for the problem is given, providing several convergence guarantees.

The limits of distribution-free conditional predictive inference

- Computer Science, MathematicsInformation and Inference: A Journal of the IMA
- 2020

This work aims to explore the space in between exact conditional inference guarantees and what types of relaxations of the conditional coverage property would alleviate some of the practical concerns with marginal coverage guarantees while still being possible to achieve in a distribution-free setting.

Distributionally Robust Logistic Regression

- Computer Science, MathematicsNIPS
- 2015

This paper uses the Wasserstein distance to construct a ball in the space of probability distributions centered at the uniform distribution on the training samples, and proposes a distributionally robust logistic regression model that minimizes a worst-case expected logloss function.

Covariate Shift Adaptation by Importance Weighted Cross Validation

- Computer ScienceJ. Mach. Learn. Res.
- 2007

This paper proposes a new method called importance weighted cross validation (IWCV), for which its unbiasedness even under the covariate shift is proved, and the IWCV procedure is the only one that can be applied for unbiased classification under covariates.

Variance-based Regularization with Convex Objectives

- Computer ScienceNIPS
- 2017

An approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error, and it is shown that this procedure comes with certificates of optimality.