Reproducible Research: A Retrospective.

  title={Reproducible Research: A Retrospective.},
  author={Roger D. Peng and Stephanie C. Hicks},
  journal={Annual review of public health},
  • R. PengS. Hicks
  • Published 23 July 2020
  • Biology
  • Annual review of public health
Advances in computing technology have spurred two extraordinary phenomena in science: large-scale and high-throughput data collection coupled with the creation and implementation of complex statistical algorithms for data analysis. These two phenomena have brought about tremendous advances in scientific discovery but have raised two serious concerns. The complexity of modern data analyses raises questions about the reproducibility of the analyses, meaning the ability of independent analysts to… 

Computing with R-INLA: Accuracy and reproducibility with implications for the analysis of COVID-19 data

The results suggest existing methods of assessing the accuracy of the INLA technique may not support how COVID-19 researchers are using it, and offer a proposed set of minimum guidelines for researchers using statistical methodologies primarily validated through simulation studies.

Perspective on Data Science

  • R. PengH. Parker
  • Computer Science
    Annual Review of Statistics and Its Application
  • 2021
This review attempts to distill some core ideas from data science by focusing on the iterative process of data analysis and develop some generalizations from past experience that form the basis of a theory of data science.

The Practice of Ensuring Repeatable and Reproducible Computational Models

In this chapter guidelines are described that can be used by researchers to help make sure their work is repeatable and a scoring system is suggested that authors can use to determine how well they are doing.

Reproducible Research and GIScience: An Evaluation Using GIScience Conference Papers

A rubric for assessing the reproducibility of 75 conference papers published at the GIScience conference series in the years 2012-2018 is applied and previous recommendations for improving this situation are summarised and adapted.

A simple kit to use computational notebooks for more openness, reproducibility, and productivity in research

A starter kit to facilitate the implementation of computational notebooks in the research process, including publication, and it is hoped that such minimalist yet effective starter kit will encourage researchers to adopt this practice in their workflow, regardless of their computational background.

Promoting Open Science Through Research Data Management

Describing data management as an integral part of a research process or workflow may help contextualize the importance of related resources, practices, and concepts for researchers who may be less familiar with them.

The Quartet Data Portal: integration of community-wide resources for multiomics quality control

The Quartet Data Portal continuously collects, evaluates, and integrates the community-generated data of the distributed Quartet multiomic reference materials, and provides analysis pipelines to assess the quality of user-submitted multiomic data.

Ten simple rules for maximizing the recommendations of the NIH data management and sharing plan

10 key recommendations for creating a DMSP that is both maximally compliant and effective are provided.

An Open-Access Data Platform: Global Nutrition and Health Atlas (GNHA)

The need for an integrated nutrition data platform is defined as a web-based platform that can collect, store, track, analyze, monitor, and visually display key metrics in nutrition and health while allowing users to interact with visuals and download data provided in the platform.

FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow

FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine, which integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG and produces several files, including GO assignments, output summaries of the abovementioned programs and final annotation reports.



Ten Simple Rules for Reproducible Computational Research

It is emphasized that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducible can also be a burden for you as an individual researcher.

Reproducible epidemiologic research.

A standard for reproducibility is outlined and methods for reproducible research are proposed and implemented by use of a case study in air pollution and health.

Estimating the reproducibility of psychological science

A large-scale assessment suggests that experimental reproducibility in psychology leaves a lot to be desired, and correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.

An estimate of the science-wise false discovery rate and application to the top medical literature.

Estimation methods from the genomics community are adapted to the problem of estimating the rate of false discoveries in the medical literature using reported $P-values as the data, and suggest that themedical literature remains a reliable record of scientific progress.

What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science

  • Prasad PatilR. PengJ. Leek
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2016
The results of the Reproducibility Project: Psychology can be viewed as statistically consistent with what one might expect when performing a large-scale replication experiment.

Statistical Analyses and Reproducible Research

This article describes a software framework for both authoring and distributing integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations in data analyses, methodological descriptions, simulations, and so on.

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

This report examines several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response and shows in five case studies that the results incorporate several simple errors that may be putting patients at risk.

Why Most Published Research Findings Are False

Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.

Reproducing Statistical Results

Five proposed remedies for replication failures are discussed: improved prepublication and postpublication validation of findings; the complete disclosure of research steps; assessment of the stability of statistical findings; providing access to digital research objects, in particular data and software; and ensuring these objects are legally reusable.

Elements and Principles of Data Analysis

It is argued that the elements and principles of data analysis lay the foundational framework for a more general theory of data science.