Packaging Data Analytical Work Reproducibly Using R (and Friends)
@article{Marwick2018PackagingDA, title={Packaging Data Analytical Work Reproducibly Using R (and Friends)}, author={Ben Marwick and Carl Boettiger and Lincoln A. Mullen}, journal={The American Statistician}, year={2018}, volume={72}, pages={80 - 88} }
ABSTRACT Computers are a central tool in the research process, enabling complex and large-scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognizable way for organizing the digital materials of a research project to enable other researchers to inspect, reproduce, and…
67 Citations
DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
- Computer ScienceGates open research
- 2018
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and…
DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis
- Computer SciencebioRxiv
- 2018
A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and…
Sharing and organizing research products as R packages
- Computer ScienceBehavior research methods
- 2020
The R package standard, with extensions discussed herein, provides a format for assets and metadata that satisfies the above desiderata, facilitates reproducibility, open access, and sharing of materials through online platforms like GitHub and Open Science Framework.
Practical Reproducibility in Geography and Geosciences
- Computer Science
- 2020
It is argued that all researchers working with computers should understand these technologies to control their computing environment, and the benefits of reproducible workflows in practice are presented.
Making Reproducible Research Simple Using RMarkdown and the OSF
- Computer ScienceHCI
- 2020
This paper shows a workflow for reproducible research using the R language and a set of additional packages and tools that simplify a reproducibleResearch procedure.
A Reproducible Data Analysis Workflow
- Computer ScienceQuantitative and Computational Methods in Behavioral Sciences
- 2021
Combining containerization, dependence management, version management, and dynamic document generation, the proposed workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data.
Creating optimal conditions for reproducible data analysis in
R with ‘fertile’
- Computer ScienceStat
- 2021
Fertility is an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment and is designed to educate users on why their mistakes are problematic and how to fix them.
A large-scale study on research code quality and execution
- Computer ScienceScientific data
- 2022
The quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository finds that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices.
Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge
- BiologySocius: Sociological Research for a Dynamic World
- 2019
The authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge, and draw on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers and cloud computing.
REPRODUCIBILITY AND REPLICABILITY FORUM Practical Reproducibility in Geography and Geosciences Reproducible workflows in geography and geosciences
- Computer Science
- 2020
It is argued that all researchers working with computers should understand these technologies to control their computing environment, and it is concluded that researchers today can overcome many barriers and achieve a very high degree of reproducibility.
References
SHOWING 1-10 OF 139 REFERENCES
Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research
- Computer Science
- 2013
Scholarly dissemination and communication standards are changing to reflect the increasingly computational nature of scholarly research, primarily to include the sharing of the data and code…
An introduction to Docker for reproducible research
- Computer ScienceOPSR
- 2015
How the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a 'DevOps' philosophy, to address these challenges is examined.
Reproducible Research: A Bioinformatics Case Study
- Computer ScienceStatistical applications in genetics and molecular biology
- 2005
While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has…
Statistical Analyses and Reproducible Research
- Computer Science
- 2007
This article describes a software framework for both authoring and distributing integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations in data analyses, methodological descriptions, simulations, and so on.
Data Carpentry: Workshops to Increase Data Literacy for Researchers
- Computer Science
- 2015
Data Carpentry focuses on data literacy in particular, with the objective of teaching skills to researchers to enable them to retrieve, view, manipulate, analyze and store their and other's data in an open and reproducible way in order to extract knowledge from data.
R Packages
- Computer Science
- 2015
This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham's package development philosophy, and starts you with the basics and shows how to improve your package writing over time.
Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation
- Computer Science
- 2017
Four general principles of reproducible research that have emerged in other fields are presented and an archaeological case study is described that shows how each principle can be implemented using freely available software.
Ten Simple Rules for Reproducible Computational Research
- BiologyPLoS Comput. Biol.
- 2013
It is emphasized that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducible can also be a burden for you as an individual researcher.
bookdown: Authoring Books and Technical Documents with R Markdown
- Computer Science
- 2016
It is argued throughout this review that there are some substantial technical leaps that still need to be made to get authors to use bookdown on a reasonably sized book, but that it immediately solves the problem of frequent dealing with LATEX typesetting issues by focusing on the HTML version of books instead.
Reproducible Research with R and RStudio
- Computer Science
- 2013
This book discusses how to use R, knitr, and RStudio for Reproducible Research, as well as some basic concepts of data Gathering and Storage.