A probability-based measure of effect size: robustness to base rates and other factors.

  title={A probability-based measure of effect size: robustness to base rates and other factors.},
  author={John Ruscio},
  journal={Psychological methods},
  volume={13 1},
  • J. Ruscio
  • Published 1 March 2008
  • Psychology
  • Psychological methods
Calculating and reporting appropriate measures of effect size are becoming standard practice in psychological research. One of the most common scenarios encountered involves the comparison of 2 groups, which includes research designs that are experimental (e.g., random assignment to treatment vs. placebo conditions) and nonexperimental (e.g., testing for gender differences). Familiar measures such as the standardized mean difference (d) or the point-biserial correlation (rpb) characterize the… 

Figures and Tables from this paper

Generalizations and Extensions of the Probability of Superiority Effect Size Estimator

This work provides a suite of programs that should make it easy to use the A statistic and accompany it with a confidence interval in a wide variety of research contexts and recommends a bootstrap method that can be used for each generalization of A.

The influence of base rates on correlations: An evaluation of proposed alternative effect sizes with real-world data

A large sample of real-world data was used to illustrate the base rate dependence of correlations when applied to dichotomous or ordinal data to recommend AUCs, Pearson/Thorndike adjusted correlations, Cohen’s d, or polychoric correlations should be considered as alternate effect size statistics in many contexts.

A robust effect size measure Aw for MANOVA with non-normal and non-homogenous data

A common research question in psychology entails examining whether significant group differences (e.g. male and female) can be found in a list of numeric variables that measure the same underlying

Confidence Intervals for the Probability of Superiority Effect Size Measure and the Area Under a Receiver Operating Characteristic Curve

Based on the simulation study results, the bias-corrected and accelerated bootstrap method is recommended for constructing a CI for the A statistic; bootstrap methods also provided the least biased and most accurate standard error of A.

Beyond Cohen's d: Alternative Effect Size Measures for Between-Subject Designs

Given the long history of discussion of issues surrounding statistical testing and effect size indices and various attempts by the American Psychological Association and by the American Educational

Persons as Effect Sizes

Traditional indices of effect size are designed to answer questions about average group differences, associations between variables, and relative risk. For many researchers, an additional, important


Researchers are encouraged to report statistics in their studies, including measures of effect size and confidence intervals (CIs). The probability of superiority ( A ) has many appealing

A psychometric analysis of choice reaction time measures

This tutorial provides an introduction to PIMs where several theoretical properties are discussed, why it could be useful for behavioral the sciences is motivated, and how it can be used in practice using the R package pim.

Comparing the relative fit of categorical and dimensional latent variable models using consistency tests.

An approach to consistency testing is presented that builds on prior work demonstrating that parallel analyses of categorical and dimensional comparison data provide an accurate index of the relative fit of competing structural models.



Review of assumptions and problems in the appropriate conceptualization of effect size.

Estimation of the effect size parameter, D, the standardized difference between population means, is sensitive to heterogeneity of variance (heteroscedasticity), which seems to abound in psychological data, and various proposed solutions are reviewed, including measures that do not make these assumptions.

When effect sizes disagree: the case of r and d.

The authors demonstrate the issue by focusing on two popular effect-size measures, the correlation coefficient and the standardized mean difference, both of which can be used when one variable is dichotomous and the other is quantitative.

Biases of success rate differences shown in binomial effect size displays.

  • L. Hsu
  • Psychology
    Psychological methods
  • 2004
Differences in the sizes of biases linked to different correlations suggest that BESD SRDs reported for different correlations are not comparable.

A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong

McGraw and Wong (1992) described an appealing index of effect size, called CL, which measures the difference between two populations in terms of the probability that a score sampled at random from

Probability of the superior outcome of one treatment over another.

An intuitively appealing indicator of magnitude of effect in applied research is an estimate of the probability of the superior outcome of one treatment over another. Parametric and nonparametric

Statistical Practices of Educational Researchers: An Analysis of their ANOVA, MANOVA, and ANCOVA Analyses

Articles published in several prominent educational journals were examined to investigate the use of data analysis tools by researchers in four research paradigms: between-subjects univariate

Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d, and r

This work outlines why AUC is the preferred measure of predictive or diagnostic accuracy in forensic psychology or psychiatry, and urges researchers and practitioners to use numbers rather than verbal labels to characterize effect sizes.

Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples.

Three robust tests of stochastic equality are identified that perform well in Type I error rates and power except when extremely skewed data co-occur with very small n.

An alternative to Cohen's standardized mean difference effect size: a robust parameter and confidence interval in the two independent groups case.

The authors argue that a robust version of Cohen's effect size constructed by replacing population means with 20% trimmed means and the population standard deviation with the square root of a 20%

A common language effect size statistic.

Some of the shortcomings in interpretability and generalizability of the effect size statistics currently available to researchers can be overcome by a statistic that expresses how often a score