Learning Prediction Intervals for Model Performance

@inproceedings{Elder2020LearningPI,
  title={Learning Prediction Intervals for Model Performance},
  author={Benjamin Elder and Matthew Arnold and Anupama Murthi and Jir{\'i} Navr{\'a}til},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2020}
}
Understanding model performance on unlabeled data is a fundamental challenge of developing, deploying, and maintaining AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of which require laborious manual data labeling. Automated performance prediction techniques aim to mitigate this burden, but potential inaccuracy and a lack of trust in their predictions has prevented their widespread adoption. We address this core problem of… 

Figures and Tables from this paper

Performance Prediction Under Dataset Shift

Empirical validation on a benchmark of ten tabular datasets shows that models based upon state-of-the-art shift detection metrics are not expressive enough to generalize to unseen domains, while Error Predictors bring a consistent improvement in performance prediction under shift.

Post-hoc Uncertainty Learning using a Dirichlet Meta-Model

A novel Bayesian meta-model is proposed to augment pre- trained models with better uncertainty quantification abili-ties, which is effective and computationally effective and feasible in many situations.

Uncertainty Quantification for Rule-Based Models

This work proposes an uncertainty quantification framework in the form of a meta-model that takes any binary classiﷁer with binary output as a black box and estimates the prediction accuracy of that base model at a given input along with a level of confldence on that estimation.

Stratification by Tumor Grade Groups in a Holistic Evaluation of Machine Learning for Brain Tumor Segmentation

This work performs a comprehensive evaluation of a glioma segmentation ML algorithm by stratifying data by specific tumor grade groups and evaluates these algorithms on each of the four axes of model evaluation—diagnostic performance, model confidence, robustness, and data quality.

ANN-Based LUBE Model for Interval Prediction of Compressive Strength of Concrete

This study uses ANN-based lower upper bound estimation (LUBE) method for construction of prediction intervals (PIs) at different confidence levels (CL) for the compressive strength of concrete for

Uncertainty Quantification 360: A Hands-on Tutorial

  • Soumya GhoshQ. Liao Yunfeng Zhang
  • Computer Science
    5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)
  • 2022
This tutorial presents an open source Python package named Uncertainty Quantification 360 (UQ360), a toolkit that provides a broad range of capabilities for quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle.

Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI

An open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle.

References

SHOWING 1-10 OF 49 REFERENCES

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data

This work proposes an approach to assist non-ML experts working with pretrained ML models with a performance predictor for pretrained black box models, which can be combined with the model, and automatically warns end users in case of unexpected performance drops.

Predictive Uncertainty Estimation via Prior Networks

This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty by parameterizing a prior distribution over predictive distributions and evaluates PNs on the tasks of identifying out-of-distribution samples and detecting misclassification on the MNIST dataset, where they are found to outperform previous methods.

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation

The experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and it is observed that these simple methods strongly outperform a vanilla point-estimate SGD in some complex benchmarks like ImageNet.

Inductive Conformal Prediction: Theory and Application to Neural Networks

The Bayesian framework and PAC theory can be used for producing upper bounds on the probability of error for a given algorithm with respect to some confidence level 1 − δ; both of these approaches however, have their drawbacks.

Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes

A novel confidence scoring mechanism for deep neural networks based on a two-model paradigm involving a base model and a meta-model that outperforms various baselines in a filtering task, i.e., task of rejecting samples with low confidence.

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.

FFORMPP: Feature-based forecast model performance prediction

Randomized Prior Functions for Deep Reinforcement Learning

It is shown that this approach is efficient with linear representations, provides simple illustrations of its efficacy with nonlinear representations and scales to large-scale problems far better than previous attempts.

Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

It is concluded that binning succeeds in significantly improving naive Bayesian probability estimates, while for improving decision tree probability estimates the recommend smoothing by -estimation and a new variant of pruning that is called curtailment.