The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability

  title={The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability},
  author={Joseph L. Fleiss and Jacob Cohen},
  journal={Educational and Psychological Measurement},
  pages={613 - 619}
or weighted kappa (Spitzer, Cohen, Fleiss and Endicott, 1967; Cohen, 1968a). Kappa is the proportion of agreement corrected for chance, and scaled to vary from -1 to +1 so that a negative value indicates poorer than chance agreement, zero indicates exactly chance agreement, and a positive value indicates better than chance agreement. A value of unity indicates perfect agreement. The use of kappa implicitly assumes that all disagreements are equally serious. When the investigator can specify the… 
A Note on the Interpretation of Weighted Kappa and its Relations to Other Rater Agreement Statistics for Metric Scales
This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute
Conditional inequalities between Cohen's kappa and weighted kappas
Some Paradoxical Results for the Quadratically Weighted Kappa
The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the
Beyond kappa: A review of interrater agreement measures
In 1960, Cohen introduced the kappa coefficient to measure chance‐corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater
Agreement among 2 × 2 Agreement Indices
A variety of measures of reliability for two-category nominal scales are reviewed and compared. It is shown that upon correcting these indices for chance agreement, there are only five distinct
Equivalences of weighted kappas for multiple raters
Weighted Specific-Category Kappa Measure of Interobserver Agreement
A Kappa-based weighted measure (Kws) of agreement on some specific category s, with Kw being a weighted average of all Kwss is proposed, with both measures being suitable for ordinal categories because of the weights being used.
Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa.
  • M. Warrens
  • Psychology
    The British journal of mathematical and statistical psychology
  • 2011
This paper presents the general function, linear in both numerator and denominator, that becomes weighted kappa after correction for chance.
Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic
It frequently occurs in psychological research that an investigator is interested in assessing the ex tent of interrater agreement when the data are measured on an ordinal scale. This monte carlo


Large sample standard errors of kappa and weighted kappa.
The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all
Bivariate Agreement Coefficients for Reliability of Data
The quality of data in content analysis, in surveys with openended questions, in the observation of unstructured social events, and so on, critically depends on the reliability with which primary
A Coefficient of Agreement for Nominal Scales
CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
  • J. Cohen
  • Physics
    Psychological bulletin
  • 1968
The Kw provides for the incorpation of ratio-scaled degrees of disagreement (or agreement) to each of the cells of the k * k table of joi.
This paper considers the mean and variance of the two statistics, kappa and weighted kappa, which are useful in measuring agreement between two raters, in the situation where they independently
Quantification of agreement in psychiatric diagnosis. A new approach.
As generally used, all of the methods used for quantifying the salient features of the data suffer from one or more deficiencies which are illustrated using the hypothetical data of Table 1.
Multiple regression as a general data-analytic system.
Techniques for using multiple regression (MR) as a general variance-accounting procedure of great flexibility, power, and fidelity to research aims in both manipulative and observational
Mental status schedule. Properties of factor-analytically derived scales.
The latest version of the MSS, designated Form A, is currently being used in a number of projects involving such varied problems as the research evaluation of treatment, case finding, routine admission assessment, and the phenomenology of mental disorders.
DIAGNO. A computer program for psychiatric diagnosis utilizing the differential diagnostic procedure.
The availability of a computer program for psychiatric diagnosis with demonstrated validity would make possible meaningful comparisons of the diagnostic composition of various populations.
The analysis of proximities: Multidimensional scaling with an unknown distance function. I.
The program is proposed as a tool for reductively analyzing several types of psychological data, particularly measures of interstimulus similarity or confusability, by making explicit the multidimensional structure underlying such data.