Excuse Me, Do You Have a Moment to Talk About Version Control?

  title={Excuse Me, Do You Have a Moment to Talk About Version Control?},
  author={Jennifer Bryan},
  journal={The American Statistician},
  pages={20 - 27}
  • J. Bryan
  • Published 2 January 2018
  • Computer Science
  • The American Statistician
ABSTRACT Data analysis, statistical research, and teaching statistics have at least one thing in common: these activities all produce many files! There are data files, source code, figures, tables, prepared reports, and much more. Most of these files evolve over the course of a project and often need to be shared with others, for reading or edits, as a project unfolds. Without explicit and structured management, project organization can easily descend into chaos, taking time away from the… 

Figures from this paper

Implementing Version Control With Git and GitHub as a Learning Objective in Statistics and Data Science Courses

A wide range of approaches to teaching Git are presented, aiming to serve as a resource for statistics and data science instructors teaching courses at any level within an undergraduate or graduate curriculum.

Best practices in statistical computing

The key steps for implementing a code quality assurance (QA) process that researchers can follow to improve their coding practices throughout a project to assure the quality of the final data, code, analyses, and results are accurate and reproducible are described.

Expanding the Scope of Statistical Computing: Training Statisticians to Be Software Engineers

A graduate course developed to meet the need for statisticians to develop statistical software, focusing on four themes: programming practices, software design, important algorithms and data structures, and essential tools and methods is seen as a model for the future evolution of the computing curriculum in statistics and data science.

GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software

  • M. Dozmorov
  • Computer Science
    Front. Bioeng. Biotechnol.
  • 2018
This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers” and “forks” (GitHub statistics) as a measure of software impact and suggests the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.

Collaborative Writing Workflows in the Data-Driven Classroom: A Conversation Starter

  • S. Stoudt
  • Computer Science
    Journal of Statistics and Data Science Education
  • 2022
This article proposes two writing workflows for use by students in a final-project setting that rely on a division of the labor, require a plan to be created and followed by members of a team, and involve communication outside of the final report document itself.

Reproducible Research in R: A tutorial on how to do the same thing more than once

The `repro`, an R-package, is introduced, which guides researchers in the installation and use of the tools required for making a research project reproducible, and suggests theUse of the proposed tools for the preregistration of study plans as reproducible computer code (preregistration as code; PAC).

Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless

This paper proposes that a reorientation of mathematical science is necessary so that its reproducibility can be readily assessed and examines how proof informs the practice of computational statistical inquiry.

Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles

Ambiguous identification of datasets impacts researchers and data centres who are unable to gain recognition and credit for their contributions to the collection, creation, curation and publication of individual datasets.

Using GitHub Classroom To Teach Statistics

GitHub Classroom aims to provide a way for students to work on and submit their assignments via Git and GitHub, giving teachers an opportunity to facilitate the integration of these version control tools into their undergraduate statistics courses.

A Fresh Look at Introductory Data Science

A case study of an introductory undergraduate course in data science designed to address the needs of graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data.



Good enough practices in scientific computing

A set of good computing practices that every researcher can adopt, regardless of their current level of computational skill are presented, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.

Git can facilitate greater reproducibility and increased transparency in science

  • Karthik Ram
  • Computer Science
    Source Code for Biology and Medicine
  • 2013
An overview of Git is provided along with use-cases that highlight how this tool can be leveraged to make science more reproducible and transparent, foster new collaborations, and support novel uses.

Dynamic Documents with R and knitr

This book shows you how to write reports in simple languages such as Markdown for statistical graphics, computing, and data analysis, suitable for both beginners and advanced users.

Ten Simple Rules for Taking Advantage of Git and GitHub

A ‘Ten Simple Rules’ guide to git and GitHub. We describe and provide examples on how to use these software to track projects, as users, teams and organizations. We document collaborative development

50 Years of Data Science

A vision of data science is presented based on the activities of people who are “learning from data,” and an academic field dedicated to improving that activity in an evidence-based manner is described, being able to accommodate the same short-term goals.

Infrastructure and Tools for Teaching Computing Throughout the Statistical Curriculum

The computational infrastructure and toolkit choices to allow for these pedagogical innovations while minimizing frustration and improving adoption for both students and instructors are discussed.

R: A language and environment for statistical computing.

Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice

The Species Problem in Iris

bookdown: Authoring Books and Technical Documents with R Markdown

It is argued throughout this review that there are some substantial technical leaps that still need to be made to get authors to use bookdown on a reasonably sized book, but that it immediately solves the problem of frequent dealing with LATEX typesetting issues by focusing on the HTML version of books instead.