The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability

@article{Fleiss1973TheEO,
  title={The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability},
  author={Joseph L. Fleiss and Jacob Cohen},
  journal={Educational and Psychological Measurement},
  year={1973},
  volume={33},
  pages={613 - 619}
}
or weighted kappa (Spitzer, Cohen, Fleiss and Endicott, 1967; Cohen, 1968a). Kappa is the proportion of agreement corrected for chance, and scaled to vary from -1 to +1 so that a negative value indicates poorer than chance agreement, zero indicates exactly chance agreement, and a positive value indicates better than chance agreement. A value of unity indicates perfect agreement. The use of kappa implicitly assumes that all disagreements are equally serious. When the investigator can specify the… Expand
A Note on the Interpretation of Weighted Kappa and its Relations to Other Rater Agreement Statistics for Metric Scales
This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absoluteExpand
Conditional inequalities between Cohen's kappa and weighted kappas
Abstract Cohen’s kappa and weighted kappa are two standard tools for describing the degree of agreement between two observers on a categorical scale. For agreement tables with three or moreExpand
Some Paradoxical Results for the Quadratically Weighted Kappa
The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of theExpand
Beyond kappa: A review of interrater agreement measures
In 1960, Cohen introduced the kappa coefficient to measure chance-corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interraterExpand
Agreement among 2 × 2 Agreement Indices
A variety of measures of reliability for two-category nominal scales are reviewed and compared. It is shown that upon correcting these indices for chance agreement, there are only five distinctExpand
Equivalences of weighted kappas for multiple raters
Abstract Cohen’s unweighted kappa and weighted kappa are popular descriptive statistics for measuring agreement between two raters on a categorical scale. With m ≥ 3 raters, there are several viewsExpand
A note on Cohen’s weighted kappa coefficient of agreement with linear weights
Abstract Vanbelle and Albert [S. Vanbelle, A. Albert, A note on the linearly weighted kappa coefficient for ordinal scales, Statistical Methodology 6 (2008) 157–163] showed that the observed andExpand
Weighted Specific-Category Kappa Measure of Interobserver Agreement
TLDR
A Kappa-based weighted measure (Kws) of agreement on some specific category s, with Kw being a weighted average of all Kwss is proposed, with both measures being suitable for ordinal categories because of the weights being used. Expand
Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa.
  • M. Warrens
  • Mathematics, Medicine
  • The British journal of mathematical and statistical psychology
  • 2011
TLDR
This paper presents the general function, linear in both numerator and denominator, that becomes weighted kappa after correction for chance. Expand
Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic
It frequently occurs in psychological research that an investigator is interested in assessing the ex tent of interrater agreement when the data are measured on an ordinal scale. This monte carloExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 16 REFERENCES
Large sample standard errors of kappa and weighted kappa.
The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when allExpand
Bivariate Agreement Coefficients for Reliability of Data
The quality of data in content analysis, in surveys with openended questions, in the observation of unstructured social events, and so on, critically depends on the reliability with which primaryExpand
A Coefficient of Agreement for Nominal Scales
CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level ofExpand
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
  • J. Cohen
  • Mathematics, Medicine
  • Psychological bulletin
  • 1968
TLDR
The Kw provides for the incorpation of ratio-scaled degrees of disagreement (or agreement) to each of the cells of the k * k table of joi. Expand
MOMENTS OF THE STATISTICS KAPPA AND WEIGHTED KAPPA
This paper considers the mean and variance of the two statistics, kappa and weighted kappa, which are useful in measuring agreement between two raters, in the situation where they independentlyExpand
Quantification of agreement in psychiatric diagnosis. A new approach.
TLDR
As generally used, all of the methods used for quantifying the salient features of the data suffer from one or more deficiencies which are illustrated using the hypothetical data of Table 1. Expand
Multiple regression as a general data-analytic system.
Techniques for using multiple regression (MR) as a general variance-accounting procedure of great flexibility, power, and fidelity to research aims in both manipulative and observationalExpand
Mental status schedule. Properties of factor-analytically derived scales.
TLDR
The latest version of the MSS, designated Form A, is currently being used in a number of projects involving such varied problems as the research evaluation of treatment, case finding, routine admission assessment, and the phenomenology of mental disorders. Expand
DIAGNO. A computer program for psychiatric diagnosis utilizing the differential diagnostic procedure.
TLDR
The availability of a computer program for psychiatric diagnosis with demonstrated validity would make possible meaningful comparisons of the diagnostic composition of various populations. Expand
The analysis of proximities: Multidimensional scaling with an unknown distance function. I.
A computer program is described that is designed to reconstruct the metric configuration of a set of points in Euclidean space on the basis of essentially nonmetric information about thatExpand
...
1
2
...