# Beyond kappa: A review of interrater agreement measures

@article{Banerjee1999BeyondKA, title={Beyond kappa: A review of interrater agreement measures}, author={Mousumi Banerjee and Michelle Hopkins Capozzoli and Laura McSweeney and Debajyoti Sinha}, journal={Canadian Journal of Statistics}, year={1999}, volume={27} }

In 1960, Cohen introduced the kappa coefficient to measure chance‐corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater agreement measure have been proposed in the literature. This paper reviews and critiques various approaches to the study of interrater agreement, for which the relevant data comprise either nominal or ordinal categorical ratings from multiple raters. It presents a comprehensive compilation of the main…

## 878 Citations

### Assessing agreement between raters from the point of coefficients and loglinear models

- Mathematics
- 2017

Abstract: In square contingency tables, analysis of agreement between row and column classifications is of interest. For nominal categories, kappa coefficient is used to summarize the degree of…

### Some Statistical Aspects of Measuring Agreement Based on a Modified Kappa

- Mathematics
- 2009

The focus of this paper is the statistical inference of the problem of assessing agreement or disagreement between two raters who employ measurements on a two-level nominal scale. The purpose of this…

### On the Equivalence of Multirater Kappas Based on 2-Agreement and 3-Agreement with Binary Scores

- Physics
- 2012

Cohen’s kappa is a popular descriptive statistic for summarizing agreement between the classifications of two raters on a nominal scale. With raters there are several views in the literature on how…

### Meta-analysis of Cohen’s kappa

- PsychologyHealth Services and Outcomes Research Methodology
- 2011

Cohen’s κ is the most important and most widely accepted measure of inter-rater reliability when the outcome of interest is measured on a nominal scale. The estimates of Cohen’s κ usually vary from…

### Computing inter-rater reliability and its variance in the presence of high agreement.

- PsychologyThe British journal of mathematical and statistical psychology
- 2008

This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC1 coefficient, and proposes new variance estimators for the multiple-rater generalized pi and AC1 statistics, whose validity does not depend upon the hypothesis of independence between raters.

### Multi-rater delta: extending the delta nominal measure of agreement between two raters to many raters

- BiologyJournal of Statistical Computation and Simulation
- 2021

The coefficient delta is extended from R = 2 raters to R’s kappa (coefficient multi-rater delta), demonstrating that it can be expressed in the kappa format and has the same advantages as the coefficient delta with regard to the type kappa classic coefficients.

### Bayesian Inference for Kappa from Single and Multiple Studies

- MathematicsBiometrics
- 2000

Bayesian analysis for kappa that can be routinely implemented using Markov chain Monte Carlo methodology is described and extensive simulation is carried out to compare the performances of the Bayesian and the frequentist tests.

### Statistical description of interrater variability in ordinal ratings

- SociologyStatistical methods in medical research
- 2000

A new graphical approach to describing interrater variability that involves a simple frequency distribution display of the category probabilities and provides a simple visual summary of the rating data is presented.

### A COMPARISON OF COHEN'S KAPPA AND AGREEMENT COEFFICIENTS BY CORRADO GINI

- Economics
- 2013

The paper compares four coefficients that can be used to summarize inter-rater agreement on a nominal scale. The coefficients are Cohen's kappa and three coefficients that were originally proposed by…

## References

SHOWING 1-10 OF 80 REFERENCES

### The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability

- Psychology
- 1973

or weighted kappa (Spitzer, Cohen, Fleiss and Endicott, 1967; Cohen, 1968a). Kappa is the proportion of agreement corrected for chance, and scaled to vary from -1 to +1 so that a negative value…

### Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic

- Psychology
- 1977

It frequently occurs in psychological research that an investigator is interested in assessing the ex tent of interrater agreement when the data are measured on an ordinal scale. This monte carlo…

### Measuring interrater reliability among multiple raters: an example of methods for nominal data.

- MathematicsStatistics in medicine
- 1990

Modifications of previously published estimators appropriate for measurement of reliability in the case of stratified sampling frames are introduced and interpret these measures in view of standard errors computed using the jackknife.

### Extension of the kappa coefficient.

- PsychologyBiometrics
- 1980

An extension of the kappa coefficient is proposed which is appropriate for use with multiple observations per subject and for multiple response choices per observation and to illustrate new approaches to difficult problems in evaluation of reliability.

### Assessing interrater agreement from dependent data.

- PsychologyBiometrics
- 1997

This work investigates the use of a latent model proposed by Qu, Piedmonte, and Medendorp (1995) to estimate the correlation between raters for each method, and test for their equality.

### Modelling patterns of agreement and disagreement

- PsychologyStatistical methods in medical research
- 1992

A survey of ways of statistically modelling patterns of observer agreement and disagreement is presented, with main emphasis on modelling inter-observer agreement for categorical responses, both for nominal and ordinal response scales.

### Measurement of interrater agreement with adjustment for covariates.

- MathematicsBiometrics
- 1996

The kappa coefficient measures chance-corrected agreement between two observers in the dichotomous classification of subjects and assumes both raters have the same marginal probability of classification, but this probability may depend on one or more covariates.

### Large sample standard errors of kappa and weighted kappa.

- Mathematics
- 1969

The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all…

### The measurement of observer agreement for categorical data.

- MathematicsBiometrics
- 1977

A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.