Reproducing Statistical Results

  title={Reproducing Statistical Results},
  author={Victoria Stodden},
The reproducibility of statistical findings has become a concern not only for statisticians, but for all researchers engaged in empirical discovery. Section 2 of this article identifies key reasons statistical findings may not replicate, including power and sampling issues; misapplication of statistical tests; the instability of findings under reasonable perturbations of data or models; lack of access to methods, data, or equipment; and cultural barriers such as researcher incentives and… 

Figures from this paper

The reproducibility of statistical results in psychological research: An investigation using unpublished raw data.

The reproducibility of the major statistical conclusions drawn in 46 articles published in 2012 in three APA journals was investigated, suggesting that APA style reporting in conjunction with raw data makes numerical verification at least hard, if not impossible.

A Survey of Reporting Practices of Computer Simulation Studies in Statistical Research

The survey results of Hauck and Anderson are updated using a sample of studies applying simulation methods in statistical research to assess the extent to which the recommendations of Hoaglin and Andrews and others for conducting simulation studies have been adopted.

Metaresearch for Evaluating Reproducibility in Ecology and Evolution

It is argued that a large discrepancy between the proportion of “positive” or “significant” results and the average statistical power of empirical research and a prevailing publish‐or‐perish research culture that encourages questionable research practices constitute sufficient reason to systematically evaluate the reproducibility of the evidence base in ecology and evolution.

Reproducible Research: A Retrospective.

The origins of reproducible research are discussed, the current status of reproducibility in public health research is characterized, and Reproducibility to current concerns about replicability of scientific findings are connected.

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

An observational evaluation of a mandatory open data policy introduced at the journal Cognition indicated a substantial post-policy increase in data available statements, although not all data appeared reusable, and there were no clear indications that original conclusions were seriously impacted.

Synthetic data for open and reproducible methodological research in social sciences and official statistics

This paper presents a synthetic but realistic dataset based on social science data, that fosters evaluating and developing estimators in social sciences in a realistic framework providing individual and household data.

Influence of multiple hypothesis testing on reproducibility in neuroimaging research

The results suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments, and permutation testing is the most powerful method among the considered approaches to multiple testing.

Data Analysis: Strengthening Inferences in Quantitative Education Studies Conducted by Novice Researchers

Data analysis is a significant methodological component when conducting quantitative education studies. Guidelines for conducting data analyses in quantitative education studies are common but often



Why Most Published Research Findings Are False

Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.

A Systematic Statistical Approach to Evaluating Evidence from Observational Studies

Some of the challenges encountered in observational studies are reviewed and an alternative, data-driven approach to observational study design, execution, and analysis is reviewed.

Revised standards for statistical evidence

  • V. Johnson
  • Computer Science
    Proceedings of the National Academy of Sciences
  • 2013
Modifications of common standards of evidence are proposed to reduce the rate of nonreproducibility of scientific research by a factor of 5 or greater and to correct the problem of unjustifiably high levels of significance.

False-Positive Psychology

It is shown that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings, flexibility in data collection, analysis, and reporting dramatically increases actual false- positive rates, and a simple, low-cost, and straightforwardly effective disclosure-based solution is suggested.

A peculiar prevalence of p values just below .05

In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the

Discovering Findings That Replicate From a Primary Study of High Dimension to a Follow-Up Study

We consider the problem of identifying whether findings replicate from one study of high dimension to another, when the primary study guides the selection of hypotheses to be examined in the

Scientific Utopia

Strategies for improving scientific practices and knowledge accumulation are developed that account for ordinary human motivations and biases and can reduce the persistence of false findings.

Interpreting observational studies: why empirical calibration is needed to correct p-values

This experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present, and empirical calibration was found to reduce spurious results to the desired 5% level.

Again, and Again, and Again …

. . . Replication—The confirmation of results and conclusions from one study obtained independently in another—is considered the scientific gold standard. New tools and technologies, massive amounts