Large sample standard errors of kappa and weighted kappa.

  title={Large sample standard errors of kappa and weighted kappa.},
  author={Joseph L. Fleiss and Jacob Cohen and B. S. Everitt},
  journal={Psychological Bulletin},
The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all disagreements may be considered equally serious, and weighted kappa is appropriate when the relative seriousness of the different possible disagreements can be specified. The papers describing these two statistics also present expressions for their standard errors. These expressions are incorrect, having… 

Tables from this paper

Measurement of Interobserver Disagreement: Correction of Cohen’s Kappa for Negative Values

As measures of interobserver agreement for both nominal and ordinal categories, Cohen’s kappa coefficients appear to be the most widely used with simple and meaningful interpretations. However, for

The standard error of Cohen's Kappa.

The conclusions are that the standard error of kappa under the null hypothesis (that kappa is zero) should not be used except when the null is plausible, and that in small samples the distribution of the estimate of k Kappa appears very non-symmetric, and it is preferable to base confidence intervals on transformations of kappas.

Hubert's multi-rater kappa revisited.

Hubert's (nominal) and Schuster and Smith's (ordinal) kappa coefficients and formulae for the large-sample variances for the estimators of all these coefficients are given, allowing the latter to illustrate the different ways of carrying out inference and, with the use of simulation, to select the optimal procedure.

An implicit enumeration method for an exact test of weighted kappa.

An implicit enumeration algorithm for conducting an exact test of weighted kappa, which can be applied to tables of non-trivial size and is particularly efficient for 'good' to 'excellent' values of weightedKappa that typically have very small p-values.

Assessing the reliability of ordered categorical scales using kappa-type statistics

Methods for the analysis of reliability of ordered categorical scales are discussed, focussing on the limitation of the single summary-weighted kappa coefficients. A symmetric matrix of kappa-type

A general program for the calculation of the kappa coefficient

The present program provides a more comprehensive package by including features found in these earlier programs, while including additional features as well that are offered as efficient means of providing one or more of these )( features.

Can One Use Cohen’s Kappa to Examine Disagreement?

Abstract. This research discusses the use of Cohen’s κ (kappa), Brennan and Prediger’s κn, and the coefficient of raw agreement for the examination of disagreement. Three scenarios are considered.

Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α

The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters (e.g., diagnosticians) for their a priori agreement about the base rates of categories (e.g., base

A computer program to determine interrater reliability for dichotomous-ordinal rating scales

Assessing reliability using standard statistical tests involves the adoption of assumptions of normality of distribution, meaningful agreement, negligible chance agreement, and meaningful total



A Coefficient of Agreement for Nominal Scales

CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.

  • J. Cohen
  • Physics
    Psychological bulletin
  • 1968
The Kw provides for the incorpation of ratio-scaled degrees of disagreement (or agreement) to each of the cells of the k * k table of joi.

Linear statistical inference and its applications

Algebra of Vectors and Matrices. Probability Theory, Tools and Techniques. Continuous Probability Models. The Theory of Least Squares and Analysis of Variance. Criteria and Methods of Estimation.


This paper considers the mean and variance of the two statistics, kappa and weighted kappa, which are useful in measuring agreement between two raters, in the situation where they independently