Packaging Data Analytical Work Reproducibly Using R (and Friends)

@article{Marwick2018PackagingDA,
  title={Packaging Data Analytical Work Reproducibly Using R (and Friends)},
  author={Ben Marwick and Carl Boettiger and Lincoln A. Mullen},
  journal={The American Statistician},
  year={2018},
  volume={72},
  pages={80 - 88}
}
ABSTRACT Computers are a central tool in the research process, enabling complex and large-scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognizable way for organizing the digital materials of a research project to enable other researchers to inspect, reproduce, and… 
DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and
DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and
Sharing and organizing research products as R packages
TLDR
The R package standard, with extensions discussed herein, provides a format for assets and metadata that satisfies the above desiderata, facilitates reproducibility, open access, and sharing of materials through online platforms like GitHub and Open Science Framework.
Practical Reproducibility in Geography and Geosciences
TLDR
It is argued that all researchers working with computers should understand these technologies to control their computing environment, and the benefits of reproducible workflows in practice are presented.
Making Reproducible Research Simple Using RMarkdown and the OSF
TLDR
This paper shows a workflow for reproducible research using the R language and a set of additional packages and tools that simplify a reproducibleResearch procedure.
A Reproducible Data Analysis Workflow
TLDR
Combining containerization, dependence management, version management, and dynamic document generation, the proposed workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data.
Creating optimal conditions for reproducible data analysis in R with ‘fertile’
TLDR
Fertility is an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment and is designed to educate users on why their mistakes are problematic and how to fix them.
A large-scale study on research code quality and execution
TLDR
The quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository finds that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices.
Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge
TLDR
The authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge, and draw on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers and cloud computing.
REPRODUCIBILITY AND REPLICABILITY FORUM Practical Reproducibility in Geography and Geosciences Reproducible workflows in geography and geosciences
TLDR
It is argued that all researchers working with computers should understand these technologies to control their computing environment, and it is concluded that researchers today can overcome many barriers and achieve a very high degree of reproducibility.
...
...

References

SHOWING 1-10 OF 139 REFERENCES
Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research
Scholarly dissemination and communication standards are changing to reflect the increasingly computational nature of scholarly research, primarily to include the sharing of the data and code
An introduction to Docker for reproducible research
TLDR
How the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a 'DevOps' philosophy, to address these challenges is examined.
Reproducible Research: A Bioinformatics Case Study
  • R. Gentleman
  • Computer Science
    Statistical applications in genetics and molecular biology
  • 2005
While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has
Statistical Analyses and Reproducible Research
TLDR
This article describes a software framework for both authoring and distributing integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations in data analyses, methodological descriptions, simulations, and so on.
Data Carpentry: Workshops to Increase Data Literacy for Researchers
TLDR
Data Carpentry focuses on data literacy in particular, with the objective of teaching skills to researchers to enable them to retrieve, view, manipulate, analyze and store their and other's data in an open and reproducible way in order to extract knowledge from data.
R Packages
TLDR
This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham's package development philosophy, and starts you with the basics and shows how to improve your package writing over time.
Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation
TLDR
Four general principles of reproducible research that have emerged in other fields are presented and an archaeological case study is described that shows how each principle can be implemented using freely available software.
Ten Simple Rules for Reproducible Computational Research
TLDR
It is emphasized that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducible can also be a burden for you as an individual researcher.
bookdown: Authoring Books and Technical Documents with R Markdown
TLDR
It is argued throughout this review that there are some substantial technical leaps that still need to be made to get authors to use bookdown on a reasonably sized book, but that it immediately solves the problem of frequent dealing with LATEX typesetting issues by focusing on the HTML version of books instead.
Reproducible Research with R and RStudio
TLDR
This book discusses how to use R, knitr, and RStudio for Reproducible Research, as well as some basic concepts of data Gathering and Storage.
...
...