Computing inter-rater reliability and its variance in the presence of high agreement.
@article{Gwet2008ComputingIR,
title={Computing inter-rater reliability and its variance in the presence of high agreement.},
author={Kilem L. Gwet},
journal={The British journal of mathematical and statistical psychology},
year={2008},
volume={61 Pt 1},
pages={
29-48
}
}Pi (pi) and kappa (kappa) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC1 coefficient. Also proposed are new variance…
905 Citations
Statistical inference of agreement coefficient between two raters with binary outcomes
- PsychologyCommunications in Statistics - Theory and Methods
- 2019
Abstract Scott’s pi and Cohen’s kappa are widely used for assessing the degree of agreement between two raters with binary outcomes. However, many authors have pointed out its paradoxical behavior,…
A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis
- Mathematics
- 2012
Agreement analysis is conducted to assess reliability among rating results performed repeatedly on the same subjects by one or more raters. The kappa statistic is commonly used when rating scales are…
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
- MathematicsEducational and psychological measurement
- 2016
A technique similar to the classical pairwise t test for means, which is based on a large-sample linear approximation of the agreement coefficient is proposed, which requires neither advanced statistical modeling skills nor considerable computer programming experience.
A new coefficient of interrater agreement: The challenge of highly unequal category proportions.
- Psychology
- 2019
We derive a general structure that encompasses important coefficients of interrater agreement such as the S-coefficient, Cohen's kappa, Scott's pi, Fleiss' kappa, Krippendorff's alpha, and Gwet's…
Variance Estimation of Nominal-Scale Inter-Rater Reliability with Random Selection of Raters
- Mathematics
- 2008
Most inter-rater reliability studies using nominal scales suggest the existence of two populations of inference: the population of subjects (collection of objects or persons to be rated) and that of…
Statistical inference of Gwet’s AC1 coefficient for multiple raters and binary outcomes
- Psychology
- 2020
Abstract Cohen’s kappa and intraclass kappa are widely used for assessing the degree of agreement between two raters with binary outcomes. However, many authors have pointed out its paradoxical…
Fleiss’ kappa statistic without paradoxes
- Psychology
- 2015
The Fleiss’ kappa statistic is a well-known index for assessing the reliability of agreement between raters. It is used both in the psychological and in the psychiatric field. Unfortunately, the…
How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution?
- Sociology
- 2016
ABSTRACT Interrater reliability studies are used in a diverse set of fields. Often, these investigations involve three or more raters, and thus, require the use of indices such as Fleiss’s kappa,…
Implementing a General Framework for Assessing Interrater Agreement in Stata
- Computer ScienceThe Stata Journal: Promoting communications on statistics and Stata
- 2018
Gwent’s (2014, Handbook of Inter-Rater Reliability) recently developed framework of interrater agreement coefficients is reviewed and the kappaetc command is introduced, which implements this framework in Stata.
Large-Sample Variance of Fleiss Generalized Kappa
- MathematicsEducational and psychological measurement
- 2021
The purpose of this article is to show that the large-sample variance of Fleiss’ generalized kappa is systematically being misused, is invalid as a precision measure for kappa, and cannot be used for constructing confidence intervals.
References
SHOWING 1-10 OF 24 REFERENCES
Beyond kappa: A review of interrater agreement measures
- Psychology
- 1999
In 1960, Cohen introduced the kappa coefficient to measure chance-corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater…
An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers.
- MathematicsBiometrics
- 1977
A subset of 'observers who demonstrate a high level of interobserver agreement can be identified by using pairwise agreement statistics betweeni each observer and the internal majority standard opinion on each subject.
Large sample standard errors of kappa and weighted kappa.
- Mathematics
- 1969
The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all…
A Coefficient of Agreement for Nominal Scales
- Psychology
- 1960
CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of…
High agreement but low kappa: II. Resolving the paradoxes.
- BusinessJournal of clinical epidemiology
- 1990
Integration and generalization of kappas for multiple raters.
- Physics
- 1980
J. A. Cohen's kappa (1960) for measuring agreement between 2 raters, using a nominal scale, has been extended for use with multiple raters by R. J. Light (1971) and J. L. Fleiss (1971). In the…
Ramifications of a population model forκ as a coefficient of reliability
- Psychology
- 1979
Coefficientκ is generally defined in terms of procedures of computation rather than in terms of a population. Here a population definition is proposed. On this basis, the interpretation ofκ as a…
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
- PhysicsPsychological bulletin
- 1968
The Kw provides for the incorpation of ratio-scaled degrees of disagreement (or agreement) to each of the cells of the k * k table of joi.
Categorical data analysis (2nd ed.)
- 2002





