• Corpus ID: 235313948

Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI

  title={Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI},
  author={Soumya Shubhra Ghosh and Qingzi Vera Liao and Karthikeyan Natesan Ramamurthy and Jir{\'i} Navr{\'a}til and Prasanna Sattigeri and Kush R. Varshney and Yunfeng Zhang},
In this paper, we describe an open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models. The goal of this toolkit is twofold: first, to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle; second, to encourage further exploration of UQ’s connections to other pillars of trustworthy AI… 

Figures from this paper

Uncertainty-Based Rejection in Machine Learning: Implications for Model Development and Interpretability
This work focused on applying UQ into practice, closing the gap of its utility in the ML pipeline and giving insights into how UQ is used to improve model development and its interpretability.
Human-Centered Explainable AI (XAI): From Algorithms to User Experiences
This chapter begins with a high-level overview of the technical landscape of XAI algorithms, then selectively survey recent HCI works that take human-centered approaches to design, evaluate, and provide conceptual and methodological tools for XAI.
A Survey on Uncertainty Toolkits for Deep Learning
This work investigates 11 toolkits with respect to modeling and evaluation capabilities, providing an in-depth comparison for the three most promising ones, namely Pyro, Tensorflow Probability, and Uncertainty Quantification 360, and concludes that the last one has the larger methodological scope.
Reputational Risk Associated with Big Data Research and Development: An Interdisciplinary Perspective
This work suggests a reframing of the public R&D ‘brand’ that responds to legitimate concerns related to data collection, development, and the implementation of big data technologies, and offers as a case study Australian agriculture.
Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies
The need to develop common frameworks to account for the design and research spaces of human-AI decision making is highlighted, so that researchers can make rigorous choices in study design, and the research community can build on each other’s work and produce generalizable scientific knowledge.
Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits
The first in-depth empirical exploration of how industry practitioners (try to) work with fairness toolkits is conducted, highlighting opportunities for the design of future open-source fairness toolKits that can support practitioners in better contextualizing, communicating, and collaborating around ML fairness efforts.
Investigating Explainability of Generative AI for Code through Scenario-based Design
This work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.
Tailored Uncertainty Estimation for Deep Learning Systems
This work proposes a framework that guides the selection of a suitable uncertainty estimation method and provides strategies to validate this choice and to uncover structural weaknesses, and helps to foster trustworthy DL systems.
TyXe: Pyro-based Bayesian neural nets for Pytorch
TyXe, a Bayesian neural network library built on top of Pytorch and Pyro, is introduced, offering a broad range of researchers and practitioners alike practical access to uncertainty estimation techniques.
Explainable Global Error Weighted on Feature Importance: The xGEWFI metric to evaluate the error of data imputation and data augmentation
Evaluating the performance of an algorithm is crucial. Evaluating the performance of data imputation and data augmentation can be similar since both generated data can be compared with an original


Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty
This work describes how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems and outlines methods for displaying uncertainty to stakeholders and recommends how to collect information required for incorporating uncertainty into existing ML pipelines.
Uncertainty Characteristics Curves: A Systematic Assessment of Prediction Intervals
It is argued that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.
Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making
It is shown that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors.
Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making
Frequency-based visualizations previously shown to allow people to better extract probabilities (quantile dotplots) yielded better decisions, and cumulative distribution function plots performed nearly as well, and both outperformed textual uncertainty, which was sensitive to the probability interval communicated.
Building Calibrated Deep Models via Uncertainty Matching with Auxiliary Interval Predictors
A novel approach for building calibrated estimators is developed that uses separate models for prediction and interval estimation, and poses a bi-level optimization problem that allows the former to leverage estimates from the latter through an \textit{uncertainty matching} strategy.
Learning Prediction Intervals for Model Performance
This work uses transfer learning to train an uncertainty model to estimate the uncertainty of model performance predictions, and believes this result makes prediction intervals, and performance prediction in general, significantly more practical for real-world use.
The Comparison and Evaluation of Forecasters.
In this paper we present methods for comparing and evaluating forecasters whose predictions are presented as their subjective probability distributions of various random variables that will be
Approximate Cross-Validation for Structured Models
This work proves -- both theoretically and empirically -- that ACV quality deteriorates smoothly with noise in the initial fit and demonstrates the accuracy and computational benefits of the proposed methods on a diverse set of real-world applications.
Uncertainty Prediction for Deep Sequential Regression Using Meta Models
This paper describes a flexible method that can generate symmetric and asymmetric uncertainty estimates, makes no assumptions about stationarity, and outperforms competitive baselines on both drift and non drift scenarios.
Calculating Interval Forecasts
The distinction between a forecasting method and a forecasting model is expounded and some general comments are made as to why prediction intervals tend to be too narrow in practice to encompas...