Performance Prediction Under Dataset Shift
@article{Maggio2022PerformancePU, title={Performance Prediction Under Dataset Shift}, author={Simona Maggio and Victor Bouvier and Leo Dreyfus-Schmidt}, journal={2022 26th International Conference on Pattern Recognition (ICPR)}, year={2022}, pages={2466-2474} }
ML models deployed in production often have to face unknown domain changes, fundamentally different from their training settings. Performance prediction models carry out the crucial task of measuring the impact of these changes on model performance. We study the generalization capabilities of various performance prediction models to new domains by learning on generated synthetic perturbations. Empirical validation on a benchmark of ten tabular datasets shows that models based upon state-of-the…
Figures and Tables from this paper
References
SHOWING 1-10 OF 20 REFERENCES
Learning Prediction Intervals for Model Performance
- Computer ScienceAAAI
- 2021
This work uses transfer learning to train an uncertainty model to estimate the uncertainty of model performance predictions, and believes this result makes prediction intervals, and performance prediction in general, significantly more practical for real-world use.
Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data
- Computer ScienceHILDA@SIGMOD
- 2019
This work proposes an approach to assist non-ML experts working with pretrained ML models with a performance predictor for pretrained black box models, which can be combined with the model, and automatically warns end users in case of unexpected performance drops.
Predicting with Confidence on Unseen Distributions
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This investigation determines that common distributional distances, such as Frechet distance or Maximum Mean Discrepancy, fail to induce reliable estimates of performance under distribution shift, and finds that the proposed difference of confidences (DoC) approach yields successful estimates of a classifier’s performance over a variety of shifts and model architectures.
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
- Computer ScienceICLR
- 2022
Average Thresholded Confidence (ATC) is proposed, a practical method that learns a threshold on the model’s confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceeds that threshold.
To Annotate or Not? Predicting Performance Drop under Domain Shift
- Computer ScienceEMNLP
- 2019
This paper investigates three families of methods (\mathcal{H}-divergence, reverse classification accuracy and confidence measures), shows how they can be used to predict the performance drop and study their robustness to adversarial domain-shifts.
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
- Computer ScienceNeurIPS
- 2019
A large-scale benchmark of existing state-of-the-art methods on classification problems and the effect of dataset shift on accuracy and calibration is presented, finding that traditional post-hoc calibration does indeed fall short, as do several other previous methods.
Underspecification Presents Challenges for Credibility in Modern Machine Learning
- Computer ScienceArXiv
- 2020
This work shows the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain, and shows that this problem appears in a wide variety of practical ML pipelines.
Are Labels Always Necessary for Classifier Accuracy Evaluation?
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work constructs a meta-dataset: a dataset comprised of datasets generated from the original images via various transformations such as rotation, background substitution, foreground scaling, etc, and reports a reasonable and promising prediction of the model accuracy.
Measuring Robustness to Natural Distribution Shifts in Image Classification
- Computer ScienceNeurIPS
- 2020
It is found that there is often little to no transfer of robustness from current synthetic to natural distribution shift, and the results indicate that distribution shifts arising in real data are currently an open research problem.
Detecting and Correcting for Label Shift with Black Box Predictors
- Computer ScienceICML
- 2018
Black Box Shift Estimation (BBSE) is proposed to estimate the test distribution of p(y) and it is proved BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible.