# Learning Prediction Intervals for Model Performance

@inproceedings{Elder2020LearningPI, title={Learning Prediction Intervals for Model Performance}, author={Benjamin Elder and Matthew Arnold and Anupama Murthi and Jir{\'i} Navr{\'a}til}, booktitle={AAAI Conference on Artificial Intelligence}, year={2020} }

Understanding model performance on unlabeled data is a fundamental challenge of developing, deploying, and maintaining AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of which require laborious manual data labeling. Automated performance prediction techniques aim to mitigate this burden, but potential inaccuracy and a lack of trust in their predictions has prevented their widespread adoption. We address this core problem of…

## 7 Citations

### Performance Prediction Under Dataset Shift

- Computer Science2022 26th International Conference on Pattern Recognition (ICPR)
- 2022

Empirical validation on a benchmark of ten tabular datasets shows that models based upon state-of-the-art shift detection metrics are not expressive enough to generalize to unseen domains, while Error Predictors bring a consistent improvement in performance prediction under shift.

### Post-hoc Uncertainty Learning using a Dirichlet Meta-Model

- Computer ScienceArXiv
- 2022

A novel Bayesian meta-model is proposed to augment pre- trained models with better uncertainty quantiﬁcation abili-ties, which is effective and computationally effective and feasible in many situations.

### Uncertainty Quantification for Rule-Based Models

- Computer ScienceArXiv
- 2022

This work proposes an uncertainty quantiﬁcation framework in the form of a meta-model that takes any binary classiﷁer with binary output as a black box and estimates the prediction accuracy of that base model at a given input along with a level of conﬂdence on that estimation.

### Stratification by Tumor Grade Groups in a Holistic Evaluation of Machine Learning for Brain Tumor Segmentation

- Computer ScienceFrontiers in Neuroscience
- 2021

This work performs a comprehensive evaluation of a glioma segmentation ML algorithm by stratifying data by specific tumor grade groups and evaluates these algorithms on each of the four axes of model evaluation—diagnostic performance, model confidence, robustness, and data quality.

### ANN-Based LUBE Model for Interval Prediction of Compressive Strength of Concrete

- EngineeringIranian Journal of Science and Technology, Transactions of Civil Engineering
- 2021

This study uses ANN-based lower upper bound estimation (LUBE) method for construction of prediction intervals (PIs) at different confidence levels (CL) for the compressive strength of concrete for…

### Uncertainty Quantification 360: A Hands-on Tutorial

- Computer Science5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)
- 2022

This tutorial presents an open source Python package named Uncertainty Quantification 360 (UQ360), a toolkit that provides a broad range of capabilities for quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle.

### Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI

- Computer ScienceArXiv
- 2021

An open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle.

## References

SHOWING 1-10 OF 49 REFERENCES

### Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data

- Computer ScienceHILDA@SIGMOD
- 2019

This work proposes an approach to assist non-ML experts working with pretrained ML models with a performance predictor for pretrained black box models, which can be combined with the model, and automatically warns end users in case of unexpected performance drops.

### Predictive Uncertainty Estimation via Prior Networks

- Computer ScienceNeurIPS
- 2018

This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty by parameterizing a prior distribution over predictive distributions and evaluates PNs on the tasks of identifying out-of-distribution samples and detecting misclassification on the MNIST dataset, where they are found to outperform previous methods.

### Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

- Computer ScienceNIPS
- 2017

This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

### On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation

- Computer ScienceArXiv
- 2020

The experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and it is observed that these simple methods strongly outperform a vanilla point-estimate SGD in some complex benchmarks like ImageNet.

### Inductive Conformal Prediction: Theory and Application to Neural Networks

- Computer Science
- 2008

The Bayesian framework and PAC theory can be used for producing upper bounds on the probability of error for a given algorithm with respect to some confidence level 1 − δ; both of these approaches however, have their drawbacks.

### Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes

- Computer ScienceAISTATS
- 2019

A novel confidence scoring mechanism for deep neural networks based on a two-model paradigm involving a base model and a meta-model that outperforms various baselines in a filtering task, i.e., task of rejecting samples with low confidence.

### Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

- Computer ScienceICML
- 2016

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.

### FFORMPP: Feature-based forecast model performance prediction

- Computer ScienceInternational Journal of Forecasting
- 2021

### Randomized Prior Functions for Deep Reinforcement Learning

- Computer ScienceNeurIPS
- 2018

It is shown that this approach is efficient with linear representations, provides simple illustrations of its efficacy with nonlinear representations and scales to large-scale problems far better than previous attempts.

### Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

- Computer ScienceICML
- 2001

It is concluded that binning succeeds in significantly improving naive Bayesian probability estimates, while for improving decision tree probability estimates the recommend smoothing by -estimation and a new variant of pruning that is called curtailment.