Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment

@article{Pontius2011DeathTK,
  title={Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment},
  author={Robert Gilmore Pontius and Marco Millones},
  journal={International Journal of Remote Sensing},
  year={2011},
  volume={32},
  pages={4407 - 4429}
}
  • R. Pontius, M. Millones
  • Published 1 August 2011
  • Environmental Science
  • International Journal of Remote Sensing
The family of Kappa indices of agreement claim to compare a map's observed classification accuracy relative to the expected accuracy of baseline maps that can have two types of randomness: (1) random distribution of the quantity of each category and (2) random spatial allocation of the categories. Use of the Kappa indices has become part of the culture in remote sensing and other fields. This article examines five different Kappa indices, some of which were derived by the first author in 2000… 
QADI as a New Method and Alternative to Kappa for Accuracy Assessment of Remote Sensing-Based Image Classification
Classification is a very common image processing task. The accuracy of the classified map is typically assessed through a comparison with real-world situations or with available reference data to
Analysis of Thematic Similarity Using Confusion Matrices
TLDR
A new statistical tool is presented to evaluate the similarity between two confusion matrices that takes into account that the number of sample units correctly and incorrectly classified can be modeled by means of a multinomial distribution and is considered a test function based on the discrete squared Hellinger distance.
Recommendations for using the relative operating characteristic (ROC)
The relative operating characteristic (ROC) is a widely-used method to measure diagnostic signals including predictions of land changes, species distributions, and ecological niches. The ROC measures
Prevalence dependence in model goodness measures with special emphasis on true skill statistics
TLDR
Sources of prevalence dependence in TSS can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).
The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment
TLDR
The Matthews correlation coefficient (MCC) is compared with two popular scores: Cohen’s Kappa, a metric that originated in social sciences, and the Brier score, a strictly proper scoring function which emerged in weather forecasting studies.
The T Index: Measuring the Reliability of Accuracy Estimates Obtained from Non-Probability Samples
TLDR
The T index is introduced and is proposed through the prism of significance testing, with T values < 0.05 indicating unreliable accuracy estimates, to build trust and improve the transparency of accuracy assessment in conditions which deviate from best practices.
...
...

References

SHOWING 1-10 OF 56 REFERENCES
Correct Formation of the Kappa Coefficient of Agreement
j= I Although the formulas as presented by Fleiss et al. (1969) and Bishop et al. (1975) appear substantially different, they are algebraically equivalent. Also note that the formulas for 8 " 82 ,
Quantification Error Versus Location Error in Comparison of Categorical Maps
This paper analyzes quantification error versus location error in a comparison between two cellular maps that show a categorical variable. Quantification error occurs when the quantity of cells of a
Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)
Summary 1In recent years the use of species distribution models by ecologists and conservation managers has increased considerably, along with an awareness of the need to provide accuracy assessment
Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy
  • G. Foody
  • Environmental Science, Mathematics
  • 2004
The accuracy of thematic maps derived by image classification analyses is often compared in remote sensing studies. This comparison is typically achieved by a basic subjective assessment of the
Selecting and interpreting measures of thematic classification accuracy
Coefficient Kappa: Some Uses, Misuses, and Alternatives
This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these
Status of land cover classification accuracy assessment
QUANTIFICATION ERROR VERSUS LOCATION ERROR IN COMPARISON OF CATEGORICAL MAPS
This paper analyzes quantification error vs. location error in a comparison between 2 cellular maps that show a categorical variable. Quantification error occurs when the quantity of cells of a
Measures of association for cross classifications
Abstract When populations are cross-classified with respect to two or more classifications or polytomies, questions often arise about the degree of association existing between the several
...
...