A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong

  title={A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong},
  author={Andr{\'a}s Vargha and Harold D Delaney},
  journal={Journal of Educational and Behavioral Statistics},
  pages={101 - 132}
  • A. Vargha, H. Delaney
  • Published 1 June 2000
  • Mathematics
  • Journal of Educational and Behavioral Statistics
McGraw and Wong (1992) described an appealing index of effect size, called CL, which measures the difference between two populations in terms of the probability that a score sampled at random from the first population will be greater than a score sampled at random from the second. McGraw and Wong introduced this "common language effect size statistic" for normal distributions and then proposed an approximate estimation for any continuous distribution. In addition, they generalized CL to the n… 

Tables from this paper

Generalizations and Extensions of the Probability of Superiority Effect Size Estimator
This work provides a suite of programs that should make it easy to use the A statistic and accompany it with a confidence interval in a wide variety of research contexts and recommends a bootstrap method that can be used for each generalization of A.
Dominance Statistics: A Simulation Study on the d Statistic
Cliff (1993) has proposed the use of a measure of effect size alternative to traditionalmean differences: δ {? = Pr(xi1 > xj2) - Pr(xi1 < xj2)}which, taken a pair of values, xi1 and xj2, from the
A probability-based measure of effect size: robustness to base rates and other factors.
  • J. Ruscio
  • Psychology
    Psychological methods
  • 2008
The probability-based measure A, the nonparametric generalization of what K. O. McGraw and S. Wong (1992) called the common language effect size statistic, is insensitive to base rates and more robust to several other factors (e.g., extreme scores, nonlinear transformations).
Beyond Cohen's d: Alternative Effect Size Measures for Between-Subject Designs
Given the long history of discussion of issues surrounding statistical testing and effect size indices and various attempts by the American Psychological Association and by the American Educational
Inferential statistics in Language Teaching Research: A review and ways forward
This article reviews all (quasi)experimental studies appearing in the first 19 volumes (1997–2015) of Language Teaching Research (LTR). Specifically, it provides an overview of how statistical
Measuring Distribution Similarities Between Samples: A Distribution-Free Overlapping Index
Using a distribution-free overlapping measure as an alternative way to quantify sample differences and assess research hypotheses expressed in terms of Bayesian evidence can considerably improve the interpretability of data analysis results in psychological research, as well as the reliability of conclusions that researchers can draw from their studies.
An improved effect size for single-case research: nonoverlap of all pairs.
Probability of bivariate superiority: A non-parametric common-language statistic for detecting bivariate relationships
This work proposes a new nonparametric statistical model that can be more intuitively understood than the conventional r: probability of bivariate superiority (PBS), and specifies the copula that forms the theoretical basis for PBS, provides an algorithm for estimating PBS from a sample, and describes the results of a Monte Carlo experiment that evaluated the algorithm across 448 data conditions.
Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data
  • J. Li
  • Psychology
    Behavior research methods
  • 2016
The results showed that Aw and dr were generally robust to these violations, and Aw slightly outperformed dr. Implications for the use of Aw and Dr in real-world research are discussed.
Inferences about a Probabilistic Measure of Effect Size When Dealing with More Than Two Groups
For two independent random variables, X and Y , let p = P (X > Y ) + 0:5P (X = Y ), which is sometimes described as a probabilistic measure of eect size. It has been argued that for various reasons,


A common language effect size statistic.
Some of the shortcomings in interpretability and generalizability of the effect size statistics currently available to researchers can be overcome by a statistic that expresses how often a score
Dominance statistics: Ordinal analyses to answer ordinal questions.
Much behavioral rescarch involves comparing the central tendencies of different groups, or of the same subjects under different conditions, and the usual analysis is some form of mean comparison.
A Simple, General Purpose Display of Magnitude of Experimental Effect
We introduce the binomial effect size display (BESD), which is useful because it is (a) easily understood by researchers, students, and lay persons; (b) widely applicable; and (c) conveniently
Determining whether an experimental group is stochastically larger than a control
Let μx and μy be the means of a control and an experimental group, respectively. The usual method of comparing the two groups is in terms of μx-μy by testing H0:μx = μy. This approach might suffice
Pairwise versus joint ranking: Another look at the Kruskal-Wallis statistic
SUMMARY A test statistic for the k-sample location problem is constructed by appropriately combining all pairwise two-sample Wilcoxon tests. The result is an analogue of the Kruskal-Wallis statistic
Investigation of the Robust Rank-Order Test for Non-Normal Populations with Unequal Variances: The Case of Reaction Time
Abstract This article examines the performance of the robust rank-order (Fligner-Policello) test of treatment effects for populations with unequal variances. Both symmetric (normal) and skewed
Rank Transformations and the Power of the Student T Test and Welch T' Test for Non-Normal Populations with Unequal Variances
Abstract Classical studies have disclosed that parametric significance tests such as t and F are robust under violation of homogeneity of variance, provided sample sizes are equal. But relatively
A Handbook for data analysis in the behavioral sciences : methodological issues
This book discusses methodological and statistical issues surrounding the development of Mathematical Models in Psychology, as well as some of the techniques used in Bayesian Statistics, a branch of statistics based on Bayesian inference.
The Kruskal-Wallis Test and Stochastic Homogeneity
For the comparison of more than two independent samples the Kruskal-Wallis H test is a preferred procedure in many situations. However, the exact null and alternative hypotheses, as well as the
Nonparametric Statistical Methods
Every applied statistician who wants to apply bootstrap with some knowledge of the underlined theory so that it is not applied improperly should take a look at this book.