AUC: a misleading measure of the performance of predictive distribution models

  title={AUC: a misleading measure of the performance of predictive distribution models},
  author={Jorge M. Lobo and Alberto Jim{\'e}nez‐Valverde and Raimundo Real},
  journal={Global Ecology and Biogeography},
The area under the receiver operating characteristic (ROC) curve, known as the AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence‐absence variable, by summarizing overall model performance over all possible thresholds. In this manuscript we review some of the features of this measure… Expand

Figures from this paper

Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling
Aim The area under the receiver operating characteristic (ROC) curve (AUC) is a widely used statistic for assessing the discriminatory capacity of species distribution models. Here, I used simulatedExpand
Recommendations for using the relative operating characteristic (ROC)
The relative operating characteristic (ROC) is a widely-used method to measure diagnostic signals including predictions of land changes, species distributions, and ecological niches. The ROC measuresExpand
Novel Nonparametric Methods For ROC Curves
The receiver operating characteristic (ROC) curve is a widely used graphical method for evaluating the discriminating power of a diagnostic test or a statistical model in various areas such asExpand
Revisiting the ROC curve for diagnostic applications with an unbalanced class distribution
  • C. O'Reilly, T. Nielsen
  • Mathematics
  • 2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA)
  • 2013
This communication investigates the impact on classifier evaluation of a high asymmetry between positive and negatives classes. It points out some necessary precautions when reporting classifierExpand
A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms
The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. Expand
Plotting receiver operating characteristic and precision–recall curves from presence and background data
  • Wenkai Li, Qinghua Guo
  • Medicine
  • Ecology and evolution
  • 2021
The proposed PB‐based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data. Expand
Prevalence affects the evaluation of discrimination capacity in presence-absence species distribution models
The aim of this study is to understand how prevalence—the ratio of instances of presence to total sample size—affects the estimation of three discrimination indexes commonly used in distributionExpand
Rethinking receiver operating characteristic analysis applications in ecological niche modeling
It is shown that, comparing two ROCs, using the AUC systematically undervalues models that do not provide predictions across the entire spectrum of proportional areas in the study area. Expand
Threshold-dependence as a desirable attribute for discrimination assessment: implications for the evaluation of species distribution models
Species distribution modelling has become a common approach in ecology in the last decades. As in any modelling exercise, evaluation of the predicted suitability surfaces is a key process, and theExpand
Sample size for the evaluation of presence-absence models
Abstract The effect of the training dataset sample size has been shown to have profound outcomes on the performance of species distribution models. However, the effects that the testing datasetExpand


Modifying ROC Curves to Incorporate Predicted Probabilities
The area under the ROC curve (AUC) is becoming a popular measure for the evaluation of classifiers, even more than other more classical measures, such as error/accuracy, logloss/entropy or precision.Expand
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics,Expand
The use of the area under the ROC curve in the evaluation of machine learning algorithms
AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities. Expand
Beware the Null Hypothesis: Critical Value Tables for Evaluating Classifiers
This paper provides tables with critical values pre-computed for the normal distribution, the t-distribution, etc for the performance metrics of binary classification: accuracy, F-measure, area under the ROC curve (AUC), and true positives in the top ten. Expand
Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.
Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions. Expand
A comparison of goodness-of-fit tests for the logistic regression model.
An examination of the performance of the tests when the correct model has a quadratic term but a model containing only the linear term has been fit shows that the Pearson chi-square, the unweighted sum-of-squares, the Hosmer-Lemeshow decile of risk, the smoothed residual sum- of-Squares and Stukel's score test, have power exceeding 50 per cent to detect moderate departures from linearity. Expand
Selecting thresholds of occurrence in the prediction of species distributions
Twelve approaches to determining thresholds were compared using two species in Europe and artificial neural networks, and the modelling results were assessed using four indices: sensitivity, specificity, overall prediction success and Cohen's kappa statistic. Expand
Evaluating predictive models of species’ distributions: criteria for selecting optimal models
Abstract The Genetic Algorithm for Rule-Set Prediction (GARP) is one of several current approaches to modeling species’ distributions using occurrence records and environmental data. Because ofExpand
Coefficient Kappa: Some Uses, Misuses, and Alternatives
This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of theseExpand
Measuring the accuracy of diagnostic systems.
  • J. Swets
  • Computer Science, Medicine
  • Science
  • 1988
For diagnostic systems used to distinguish between two classes of events, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. Expand