Measuring classifier performance: a coherent alternative to the area under the ROC curve

@article{Hand2009MeasuringCP,
  title={Measuring classifier performance: a coherent alternative to the area under the ROC curve},
  author={David J. Hand},
  journal={Machine Learning},
  year={2009},
  volume={77},
  pages={103-123}
}
  • D. Hand
  • Published 1 October 2009
  • Computer Science
  • Machine Learning
The area under the ROC curve (AUC) is a very widely used measure of performance for classification and diagnostic rules. It has the appealing property of being objective, requiring no subjective input from the user. On the other hand, the AUC has disadvantages, some of which are well known. For example, the AUC can give potentially misleading results if ROC curves cross. However, the AUC also has a much more serious deficiency, and one which appears not to have been previously recognised. This… 
On the coherence of AUC
TLDR
Should one wish to consider only optimal thresholds, it is demonstrated that a simple and more intuitive alternative to Hand’s H measure is already available in the form of the area under the cost curve, both uniform and hence modelindependent.
A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance
TLDR
Should one wish to consider only optimal thresholds, it is demonstrated that a simple and more intuitive alternative to Hand's H measure is already available in the form of the area under the cost curve, both uniform and hence model-independent.
When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance?
Half-AUC for the evaluation of sensitive or specific classifiers
  • A. Bradley
  • Computer Science
    Pattern Recognit. Lett.
  • 2014
A new evaluation measure for learning from imbalanced data
TLDR
Experimental results on 36 im balanced data sets using SVMs and logistic regression show that B42 is a good choice for evaluating on imbalanced data sets because it puts more weight on the minority class, and balanced random undersampling does not work for large and highly imbalancedData sets, although it has been reported to be effective for small data sets.
Measuring classification performance : the hmeasure package
TLDR
The hmeasure package computes and reports the H measure alongside most commonly used alternatives, including the AUC, and provides convenient plotting routines that yield insights into the differences and similarities between the various metrics.
Measuring classification performance : the hmeasure package
TLDR
The hmeasure package computes and reports the H measure alongside most commonly used alternatives, including the AUC, and provides convenient plotting routines that yield insights into the differences and similarities between the various metrics.
A better Beta for the H measure of classification performance
Threshold Choice Methods: the Missing Link
TLDR
The analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, and derives several connections between the aforementioned performance metrics, and highlights the role of calibration in choosing the threshold choice method.
A unified view of performance metrics: translating threshold choice into expected classification loss
TLDR
This analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems
TLDR
This work extends the definition of the area under the ROC curve to the case of more than two classes by averaging pairwise comparisons and proposes an alternative definition of proportion correct based on pairwise comparison of classes for a simple artificial case.
Comparing classifiers when the misallocation costs are uncertain
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics,
Analyzing a Portion of the ROC Curve
  • D. McClish
  • Mathematics
    Medical decision making : an international journal of the Society for Medical Decision Making
  • 1989
The area under the ROC curve is a common index summarizing the information contained in the curve. When comparing two ROC curves, though, problems arise when interest does not lie in the entire range
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
TLDR
The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers to present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs.
The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics
TLDR
The paper demonstrates that the graphical depiction of machineLearning metrics by means of ROC isometrics gives many useful insights into the characteristics of these metrics, and provides a foundation on which a theory of machine learning metrics can be built.
Statistical Pattern Recognition
  • J. Davis
  • Computer Science
    Technometrics
  • 2003
This chapter introduces the subject of statistical pattern recognition (SPR). It starts by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates
Mining Supervised Classification Performance Studies: A Meta-Analytic Investigation
TLDR
It is argued that the current state of the literature hardly allows large-scale investigations, and one of the possible ways to analyze the resulting data is an overall assessment of the classification methods, and methods are presented for that particular aim.
Parcel: Feature Subset Selection in Variable Cost Domains
TLDR
A design method for feature selection in the presence of varying costs, starting from the Wilcoxon nonparametric statistic for the performance of a classification system, is presented, and a concept called the maximum realisable receiver operating characteristic is introduced and proved.
...
...