• Corpus ID: 232134824

Distribution-free uncertainty quantification for classification under label shift

  title={Distribution-free uncertainty quantification for classification under label shift},
  author={Aleksandr Podkopaev and Aaditya Ramdas},
Trustworthy deployment of ML models requires a proper measure of uncertainty, especially in safety-critical applications. We focus on uncertainty quantification (UQ) for classification problems via two avenues — prediction sets using conformal prediction and calibration of probabilistic predictors by post-hoc binning — since these possess distribution-free guarantees for i.i.d. data. Two common ways of generalizing beyond the i.i.d. setting include handling covariate and label shift. Within the… 

Figures from this paper

PAC Prediction Sets Under Covariate Shift
This work proposes a novel approach that addresses this challenge of rigorously quantify the uncertainty of model predictions by constructing probably approximately correct (PAC) prediction sets in the presence of covariate shift.
Top-label calibration
A histogram binning algorithm is formalized that reduces top-label multiclass calibration to the binary case, it is proved that it has clean theoretical guarantees without distributional assumptions, and a methodical study of its practical performance is performed.
Top-label calibration and multiclass-to-binary reductions
A new and arguably natural notion of top-label calibration is proposed, which requires the reported probability of the most likely label to be calibrated, using underlying binary calibration routines.
Tracking the risk of a deployed model and detecting harmful distribution shifts
This work designs simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant increase in a risk function of interest, like accuracy or calibration, and demonstrates the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.


Detecting and Correcting for Label Shift with Black Box Predictors
Black Box Shift Estimation (BBSE) is proposed to estimate the test distribution of p(y) and it is proved BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible.
Knowing what you know: valid confidence sets in multiclass and multilabel prediction
To address the potential challenge of exponentially large confidence sets in multilabel prediction, this work builds tree-structured classifiers that efficiently account for interactions between labels that can be bolted on top of any classification model to guarantee its validity.
Distribution-free binary classification: prediction sets, confidence intervals and calibration
A 'tripod' of theorems is established that connects three notions of uncertainty quantification---calibration, confidence intervals and prediction sets---for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data.
Regularized Learning for Domain Adaptation under Label Shifts
We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target
Conformal Prediction Under Covariate Shift
It is shown that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known.
Distribution-Free Predictive Inference for Regression
A general framework for distribution-free predictive inference in regression, using conformal inference, which allows for the construction of a prediction band for the response variable using any estimator of the regression function, and a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference.
Evaluating model calibration in classification
This work develops a general theoretical calibration evaluation framework grounded in probability theory, and points out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques.
Verified Uncertainty Calibration
The scaling-binning calibrator is introduced, which first fits a parametric function to reduce variance and then bins the function values to actually ensure calibration, and estimates a model's calibration error more accurately using an estimator from the meteorological community.
Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration
A natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification is proposed.
Distribution-Free Prediction Sets
This article considers the problem of constructing nonparametric tolerance/prediction sets by starting from the general conformal prediction approach, and uses a kernel density estimator as a measure of agreement between a sample point and the underlying distribution.