Metrics of calibration for probabilistic predictions
@article{Ibarra2022MetricsOC, title={Metrics of calibration for probabilistic predictions}, author={Imanol Arrieta Ibarra and Paman Gujral and Jonathan Tannen and Mark Tygert and Cherie Xu}, journal={ArXiv}, year={2022}, volume={abs/2205.09680} }
Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, “reliability diagrams” (also known as “calibration plots”) help detect and diagnose statistically significant discrepancies — so-called “miscalibration” — between the predictions and the outcomes. The canonical reliability diagrams are based on histogramming the observed and expected values…
Figures from this paper
figure 1 figure 2 figure 3 figure 4 figure 5 figure 6 figure 7 figure 8 figure 9 figure 10 figure 11 figure 12 figure 13 figure 14 figure 15 figure 16 figure 17 figure 18 figure 19 figure 20 figure 21 figure 22 figure 23 figure 24 figure 25 figure 26 figure 27 figure 28 figure 29 figure 30 figure 31 figure 32 figure 33 figure 34 figure 35 figure 36
2 Citations
A Unifying Theory of Distance from Calibration
- Computer ScienceArXiv
- 2022
Fundamental lower and upper bounds on measuring distance to calibration are established, and theoretical justi cation for preferring certain metrics (like Laplace kernel calibration) in practice is provided in practice.
References
SHOWING 1-10 OF 19 REFERENCES
Measuring Calibration in Deep Learning
- Computer ScienceCVPR Workshops
- 2019
A comprehensive empirical study of choices in calibration measures including measuring all probabilities rather than just the maximum prediction, thresholding probability values, class conditionality, number of bins, bins that are adaptive to the datapoint density, and the norm used to compare accuracies to confidences.
T-Cal: An optimal test for the calibration of predictive models
- Computer ScienceArXiv
- 2022
This work considers detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem, and proposes T-Cal, a minimax optimal test for calibration based on a debiased plug-in estimator of the (cid:96) 2 -Expected Calibration Error (ECE).
Some Remarks on the Reliability of Categorical Probability Forecasts
- Environmental Science
- 2008
Studies on forecast evaluation often rely on estimating limiting observed frequencies conditioned on specific forecast probabilities (the reliability diagram or calibration function). Obviously,…
Verified Uncertainty Calibration
- Computer ScienceNeurIPS
- 2019
The scaling-binning calibrator is introduced, which first fits a parametric function to reduce variance and then bins the function values to actually ensure calibration, and estimates a model's calibration error more accurately using an estimator from the meteorological community.
Mitigating bias in calibration error estimation
- Computer ScienceAISTATS
- 2022
A simple alternative calibration error metric, ECE_sweep, in which the number of bins is chosen to be as large as possible while preserving monotonicity in the calibration function is proposed, which produces a less biased estimator of calibration error.
Calibration of Neural Networks using Splines
- Computer ScienceICLR
- 2021
This work introduces a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the main idea is to compare the respective cumulative probability distributions.
A graphical method of cumulative differences between two subpopulations
- Computer ScienceJ. Big Data
- 2021
C cumulative methods for the common case in which no Score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation are developed.
Cumulative deviation of a subpopulation from the full population
- Environmental ScienceJ. Big Data
- 2021
C cumulative deviation of the subpopulation from the full population as proposed in this paper sidesteps the problematic coarse binning and encode subpopulation deviation directly as the slopes of secant lines for the graphs.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.