Interjudge Agreement and the Maximum Value of Kappa

@article{Umesh1989InterjudgeAA,
  title={Interjudge Agreement and the Maximum Value of Kappa},
  author={U. N. Umesh and Robert A. Peterson and Matthew H. Sauber},
  journal={Educational and Psychological Measurement},
  year={1989},
  volume={49},
  pages={835 - 850}
}
The observed degree of agreement between judges is commonly summarized using Cohen's (1960) kappa. Previous research has related values of kappa to the marginal distributions of the agreement matrix. This manuscript provides an approach for calculating maximum values of kappa as a function of observed agreement proportions between judges. Solutions are provided separately for matrices of size 2 x 2, 3 x 3, 4 x 4, and k X k; plots are provided for the 2 x 2, 3 x 3, and 4 x 4 matrices. 
Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α
The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters (e.g., diagnosticians) for their a priori agreement about the base rates of categories (e.g., base
Nonasymptotic Significance Tests for Two Measures of Agreement
TLDR
A FORTRAN program is described that computes Cohen's kappa and Brennan and Prediger's kappas and their associated probability values based on Monte Carlo resampling and the binomial distribution, respectively.
Comkappa: A Windows ’95 program for calculating kappa and related statistics
TLDR
ComKappa, a user-friendly program based on the kappa coefficient, which calculates agreement corrected for chance under a model in which observers assume an even distribution of events across all codes, and a number of kappa-related statistics aid in the interpretation of obtained kappas.
Separation of systematic and random differences in ordinal rating scales.
TLDR
A new statistical method is introduced, which separates and measures different types of variability between paired ordered categorical measurements, and describes the variance of the rank differences between judgements as a suitable measure of this interrater variability, which is characterized as random.
Analysis of the Weighted Kappa and Its Maximum with Markov Moves.
In this paper, the notion of Markov move from algebraic statistics is used to analyze the weighted kappa indices in rater agreement problems. In particular, the problem of the maximum kappa and its
Benchmarking Kappa: Interrater Agreement in Software Process Assessments
  • K. Emam
  • Computer Science
    Empirical Software Engineering
  • 2004
TLDR
A benchmark for interpreting Kappa values is developed using data from ratings of 70 process instances collected from assessments of 19 different projects in 7 different organizations in Europe during the SPICE Trials (this is an international effort to empirically evaluate the emerging ISO/IEC 15504 International Standard for Software Process Assessment).
Comparing clusterings using combination of the kappa statistic and entropy-based measure
TLDR
A method of the evaluation of the agreement of clusterings based on the combination of the Cohen’s kappa statistic and the normalized mutual information and the independence on size of the data set and shape of clusters is proposed.
Benchmarking Kappa for Software Process Assessment Reliability Studies
TLDR
A benchmark for interpreting Kappa values is developed using data from ratings of 70 process instances collected from assessments of 19 different projects in 7 different organizations in Europe during the SPICE Trials (this is an international effort to empirically evaluate the emerging ISO/IEC 15504 International Standard for Software Process Assessment).
Power Weighted Versions of Bennett, Alpert, and Goldstein’s
A weighted version of Bennett, Alpert, and Goldstein’s S, denoted by , is studied. It is shown that the special cases of are often ordered in the same way. It is also shown that many special cases of
...
...

References

SHOWING 1-5 OF 5 REFERENCES
The measurement of observer agreement for categorical data.
TLDR
A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Coefficient Kappa: Some Uses, Misuses, and Alternatives
This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these
A Coefficient of Agreement for Nominal Scales
CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of
Kappa, Measures of Marginal Symmetry and Intraclass Correlations
Some suggestions for measuring marginal symmetry in agreement matrices for categorical data are discussed, together with measures of item-by-item agreement conditional on marginal asymmetry.