# Interjudge Agreement and the Maximum Value of Kappa

@article{Umesh1989InterjudgeAA, title={Interjudge Agreement and the Maximum Value of Kappa}, author={U. N. Umesh and Robert A. Peterson and Matthew H. Sauber}, journal={Educational and Psychological Measurement}, year={1989}, volume={49}, pages={835 - 850} }

The observed degree of agreement between judges is commonly summarized using Cohen's (1960) kappa. Previous research has related values of kappa to the marginal distributions of the agreement matrix. This manuscript provides an approach for calculating maximum values of kappa as a function of observed agreement proportions between judges. Solutions are provided separately for matrices of size 2 x 2, 3 x 3, 4 x 4, and k X k; plots are provided for the 2 x 2, 3 x 3, and 4 x 4 matrices.

## 64 Citations

Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α

- Psychology
- 2003

The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters (e.g., diagnosticians) for their a priori agreement about the base rates of categories (e.g., base…

Nonasymptotic Significance Tests for Two Measures of Agreement

- MathematicsPerceptual and motor skills
- 2001

A FORTRAN program is described that computes Cohen's kappa and Brennan and Prediger's kappas and their associated probability values based on Monte Carlo resampling and the binomial distribution, respectively.

Comkappa: A Windows ’95 program for calculating kappa and related statistics

- Computer Science
- 1998

ComKappa, a user-friendly program based on the kappa coefficient, which calculates agreement corrected for chance under a model in which observers assume an even distribution of events across all codes, and a number of kappa-related statistics aid in the interpretation of obtained kappas.

Separation of systematic and random differences in ordinal rating scales.

- MathematicsStatistics in medicine
- 1994

A new statistical method is introduced, which separates and measures different types of variability between paired ordered categorical measurements, and describes the variance of the rank differences between judgements as a suitable measure of this interrater variability, which is characterized as random.

Analysis of the Weighted Kappa and Its Maximum with Markov Moves.

- MathematicsPsychometrika
- 2022

In this paper, the notion of Markov move from algebraic statistics is used to analyze the weighted kappa indices in rater agreement problems. In particular, the problem of the maximum kappa and its…

Benchmarking Kappa: Interrater Agreement in Software Process Assessments

- Computer ScienceEmpirical Software Engineering
- 2004

A benchmark for interpreting Kappa values is developed using data from ratings of 70 process instances collected from assessments of 19 different projects in 7 different organizations in Europe during the SPICE Trials (this is an international effort to empirically evaluate the emerging ISO/IEC 15504 International Standard for Software Process Assessment).

Comparing clusterings using combination of the kappa statistic and entropy-based measure

- Computer Science
- 2019

A method of the evaluation of the agreement of clusterings based on the combination of the Cohen’s kappa statistic and the normalized mutual information and the independence on size of the data set and shape of clusters is proposed.

Benchmarking Kappa for Software Process Assessment Reliability Studies

- Computer Science
- 1998

A benchmark for interpreting Kappa values is developed using data from ratings of 70 process instances collected from assessments of 19 different projects in 7 different organizations in Europe during the SPICE Trials (this is an international effort to empirically evaluate the emerging ISO/IEC 15504 International Standard for Software Process Assessment).

Detecting Sequential Patterns and Determining Their Reliability With Fallible Observers

- Computer Science
- 2001

Analysis shows that for identically fallible observers, values for kappa are lower when codes are few and their simple probabilities variable than when codes is many and roughly equiprobable; thus no one value of kappa can be regarded as universally acceptable.

## References

SHOWING 1-5 OF 5 REFERENCES

The measurement of observer agreement for categorical data.

- MathematicsBiometrics
- 1977

A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

Coefficient Kappa: Some Uses, Misuses, and Alternatives

- Psychology
- 1981

This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these…

A Coefficient of Agreement for Nominal Scales

- Psychology
- 1960

CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of…

Kappa, Measures of Marginal Symmetry and Intraclass Correlations

- Psychology
- 1985

Some suggestions for measuring marginal symmetry in agreement matrices for categorical data are discussed, together with measures of item-by-item agreement conditional on marginal asymmetry.…