Can One Use Cohen’s Kappa to Examine Disagreement?

  title={Can One Use Cohen’s Kappa to Examine Disagreement?},
  author={Alexander von Eye and Maxine von Eye},
Abstract. This research discusses the use of Cohen’s κ (kappa), Brennan and Prediger’s κn, and the coefficient of raw agreement for the examination of disagreement. Three scenarios are considered. The first involves all disagreement cells in a rater × rater cross-tabulation. The second involves one of the triangles of disagreement cells. The third involves the cells that indicate disagreement by one (ordinal) scale unit. For each of these three scenarios, coefficients of disagreement in the… 

Figures and Tables from this paper

Exploring rater agreement: configurations of agreement and disagreement

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used for descriptive and explanatory purposes.

On the Marginal Dependency of Cohen’s κ

Cohen’s κ (kappa) is typically used as a measure of degree of rater agreement. It is often criticized because it is marginal-dependent. In this article, this characteristic is explained and

Assessing agreement between raters from the point of coefficients and loglinear models

Abstract: In square contingency tables, analysis of agreement between row and column classifications is of interest. For nominal categories, kappa coefficient is used to summarize the degree of

Estimation of symmetric disagreement using a uniform association model for ordinal agreement data

The Cohen kappa is probably the most widely used measure of agreement. Measuring the degree of agreement or disagreement in square contingency tables by two raters is mostly of interest. Modeling the

The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment

The Matthews correlation coefficient (MCC) is compared with two popular scores: Cohen’s Kappa, a metric that originated in social sciences, and the Brier score, a strictly proper scoring function which emerged in weather forecasting studies.

Characteristics of measures of directional dependence-Monte Carlo studies

Recent results (Dodge & Rousson, 2000; von Eye & DeShon, 2008) show that, in the context of linear models, the response variable will always have less skew than the explanatory variable. This applies

Agreement plus Disagreement Model for Agreement Data

In R × R square contingency tables where there is a one to one correspondence between the categories of the row and column variables, the agreement between the row and column classifications is of

Mentor-protégé expectation agreement, met expectations, and perceived effort


Comparative analysis of the use of space in 7-a-side and 8-a-side soccer: how to determine minimum sample size in observational methodology

The present study examines the relative suitability of the 7-a-side and 8-A-side formats for developing the skills of young players in Spain and simulates an increase in sample size while maintaining the characteristics of the original data (frequencies, variability, and distribution).

How essential is kratom availability and use during COVID-19? Use pattern analysis based on survey and social media data

Clinicians and public health officials need to be informed and educated about kratom use as a potential mitigation strategy for substance use disorders and for self-treatment of pain.



An Alternative to Cohen's κ

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In

Large sample standard errors of kappa and weighted kappa.

The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all

Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α

The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters (e.g., diagnosticians) for their a priori agreement about the base rates of categories (e.g., base

Significance Tests for the Measure of Raw Agreement

Significance tests for the measure of raw agreement are proposed. First, it is shown that the measure of raw agreement can be expressed as a proportionate reduction-in-error measure, sharing this

Coefficient Kappa: Some Uses, Misuses, and Alternatives

This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these

Modelling patterns of agreement and disagreement

  • A. Agresti
  • Psychology
    Statistical methods in medical research
  • 1992
A survey of ways of statistically modelling patterns of observer agreement and disagreement is presented, with main emphasis on modelling inter-observer agreement for categorical responses, both for nominal and ordinal response scales.

A Coefficient of Agreement for Nominal Scales

CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of

Models of Chance when Measuring Interrater Agreement with Kappa

Two measures of reliability for nominal scales are compared: Coefficient Kappa and kn, a modification suggested for agreement matrices with free marginals. It is illustrated that the evaluation of

A mixture model approach to indexing rater agreement.

  • C. Schuster
  • Economics
    The British journal of mathematical and statistical psychology
  • 2002
This paper discusses a class of mixture models that is defined by its characteristic of having a quasi-symmetric log-linear representation that has two interesting properties: it is possible to obtain a model-based estimate of rater reliability from the simple quasi-Symmetric agreement models.

Kappa as a Parameter of a Symmetry Model for Rater Agreement

If two raters assign targets to categories, the ratings can be arranged in a two-dimensional contingency table. A model for the frequencies in such a contingency table is presented for which Cohen’s