Power Calculations for Replication Studies

  title={Power Calculations for Replication Studies},
  author={Charlotte Micheloud and Leonhard Held},
  journal={Statistical Science},
The reproducibility crisis has led to an increasing number of replication studies being conducted. Sample sizes for replication studies are often calculated using conditional power based on the effect estimate from the original study. However, this approach is not well suited as it ignores the uncertainty of the original result. Bayesian methods are used in clinical trials to incorporate prior information into power calculations. We propose to adapt this methodology to the replication framework… 

Figures and Tables from this paper

The replication of non-inferiority and equivalence studies

Replication studies are increasingly conducted to assess credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of

The sceptical Bayes factor for the assessment of replication success

  • Samuel PawelL. Held
  • Business
    Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2022
The proposed method elegantly combines traditional notions of replication success; it ensures that both studies need to show evidence against the null, while at the same time penalising incompatibility of their effect estimates.

The sceptical Bayes factor

There is an urgent need to develop new methodology for the design and analysis of replication studies. Recently, a reverse-Bayes method called the sceptical p-value has been proposed for this

Conditional Drug Approval with the Harmonic Mean Chi-Squared Test

Approval of treatments in areas of high medical need may not follow the two-trials paradigm, but might be granted under conditional approval. Under conditional approval, the evidence for a treatment

The assessment of replication success based on relative effect size

The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies, and shows how this method is directly related to the relative effect size, the ratio of the replication to the original effect estimate.

Bayesian approaches to designing replication studies

Sample size determination in the normal-normal hierarchical model where analytical results are available and traditional sample size determination is a special case where the uncertainty on parameter values is not accounted for is investigated.

Beyond the two-trials rule: Type-I error control and sample size planning with the sceptical $p$-value

: We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical p -value. A recalibration is proposed to obtain exact

How large should the next study be? Predictive power and sample size requirements for replication studies

We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given

Combining Evidence from Clinical Trials in Conditional or Accelerated Approval

The applicability of the recently developed harmonic mean χ 2 -test to this conditional or accelerated approval framework is studied to aid in the design and assessment of the required post-market studies in terms of the level of evidence required for full approval.



A new standard for the analysis and design of replication studies

  • L. Held
  • Business
    Journal of the Royal Statistical Society: Series A (Statistics in Society)
  • 2019
A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse Bayes technique with prior‐predictive tail probabilities to define

Probabilistic forecasting of replication studies

The results suggest that many of the estimates from the original studies were inflated, possibly caused by publication bias or questionable research practices, and also that some degree of heterogeneity between original and replication effects should be expected.

Power estimation and sample size determination for replication studies of genome-wide association studies

A Empirical Bayes (EB) based method is proposed to estimate the power of replication study for each association and can objectively determine replication study’s sample size by using information extracted from primary study.

Addressing the “Replication Crisis”: Using Original Studies to Design Replication Studies with Appropriate Statistical Power

Simulation results imply that even if original studies reflect actual phenomena and were conducted in the absence of questionable research practices, popular approaches to designing replication studies may result in a low success rate, especially if the original study is underpowered.

Performing High-Powered Studies Efficiently with Sequential Analyses

Running studies with high statistical power, while effect size estimates in psychology are often inaccurate, leads to a practical challenge when designing an experiment. This challenge can be

A review of methods for futility stopping based on conditional power

  • J. Lachin
  • Mathematics
    Statistics in medicine
  • 2005
An iterative procedure is described that determines a stopping boundary on the B-value and a final test critical Z-value with specified type I and II error probabilities and the implementation in conjunction with a group sequential analysis for effectiveness is also described.

What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science

  • Prasad PatilR. PengJ. Leek
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2016
The results of the Reproducibility Project: Psychology can be viewed as statistically consistent with what one might expect when performing a large-scale replication experiment.

The Role of p-Values in Judging the Strength of Evidence and Realistic Replication Expectations

Abstract p-Values are viewed by many as the root cause of the so-called replication crisis, which is characterized by the prevalence of positive scientific findings that are contradicted in

Conditional power and friends: The why and how of (un)planned, unblinded sample size recalculations in confirmatory trials

It is shown that commonly discussed sample size recalculation rules lead to paradoxical adaptations where an initially planned optimal design is not invariant under the adaptation rule even if the planning assumptions do not change, and two alternatives are proposed to avoid such inconsistencies.