Excuse Me, Do You Have a Moment to Talk About Version Control?
@article{Bryan2018ExcuseMD, title={Excuse Me, Do You Have a Moment to Talk About Version Control?}, author={Jennifer Bryan}, journal={The American Statistician}, year={2018}, volume={72}, pages={20 - 27} }
ABSTRACT Data analysis, statistical research, and teaching statistics have at least one thing in common: these activities all produce many files! There are data files, source code, figures, tables, prepared reports, and much more. Most of these files evolve over the course of a project and often need to be shared with others, for reading or edits, as a project unfolds. Without explicit and structured management, project organization can easily descend into chaos, taking time away from the…
35 Citations
Implementing Version Control With Git and GitHub as a Learning Objective in Statistics and Data Science Courses
- Computer Science
- 2020
A wide range of approaches to teaching Git are presented, aiming to serve as a resource for statistics and data science instructors teaching courses at any level within an undergraduate or graduate curriculum.
Best practices in statistical computing
- Computer ScienceStatistics in medicine
- 2021
The key steps for implementing a code quality assurance (QA) process that researchers can follow to improve their coding practices throughout a project to assure the quality of the final data, code, analyses, and results are accurate and reproducible are described.
Expanding the Scope of Statistical Computing: Training Statisticians to Be Software Engineers
- Computer ScienceJournal of Statistics and Data Science Education
- 2019
A graduate course developed to meet the need for statisticians to develop statistical software, focusing on four themes: programming practices, software design, important algorithms and data structures, and essential tools and methods is seen as a model for the future evolution of the computing curriculum in statistics and data science.
GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software
- Computer ScienceFront. Bioeng. Biotechnol.
- 2018
This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers” and “forks” (GitHub statistics) as a measure of software impact and suggests the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.
Collaborative Writing Workflows in the Data-Driven Classroom: A Conversation Starter
- Computer ScienceJournal of Statistics and Data Science Education
- 2022
This article proposes two writing workflows for use by students in a final-project setting that rely on a division of the labor, require a plan to be created and followed by members of a team, and involve communication outside of the final report document itself.
Reproducible Research in R: A tutorial on how to do the same thing more than once
- Computer SciencePsych
- 2021
The `repro`, an R-package, is introduced, which guides researchers in the installation and use of the tools required for making a research project reproducible, and suggests theUse of the proposed tools for the preregistration of study plans as reproducible computer code (preregistration as code; PAC).
Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless
- Computer ScienceCommunications in Computer and Information Science
- 2019
This paper proposes that a reorientation of mathematical science is necessary so that its reproducibility can be readily assessed and examines how proof informs the practice of computational statistical inquiry.
Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles
- Computer ScienceData Sci. J.
- 2021
Ambiguous identification of datasets impacts researchers and data centres who are unable to gain recognition and credit for their contributions to the collection, creation, curation and publication of individual datasets.
Using GitHub Classroom To Teach Statistics
- EducationJournal of Statistics Education
- 2019
GitHub Classroom aims to provide a way for students to work on and submit their assignments via Git and GitHub, giving teachers an opportunity to facilitate the integration of these version control tools into their undergraduate statistics courses.
A Fresh Look at Introductory Data Science
- Computer ScienceJournal of Statistics and Data Science Education
- 2020
A case study of an introductory undergraduate course in data science designed to address the needs of graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data.
References
SHOWING 1-10 OF 17 REFERENCES
Good enough practices in scientific computing
- Computer SciencePLoS Comput. Biol.
- 2017
A set of good computing practices that every researcher can adopt, regardless of their current level of computational skill are presented, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.
Git can facilitate greater reproducibility and increased transparency in science
- Computer ScienceSource Code for Biology and Medicine
- 2013
An overview of Git is provided along with use-cases that highlight how this tool can be leveraged to make science more reproducible and transparent, foster new collaborations, and support novel uses.
Dynamic Documents with R and knitr
- Computer Science
- 2015
This book shows you how to write reports in simple languages such as Markdown for statistical graphics, computing, and data analysis, suitable for both beginners and advanced users.
Ten Simple Rules for Taking Advantage of Git and GitHub
- Computer SciencebioRxiv
- 2016
A ‘Ten Simple Rules’ guide to git and GitHub. We describe and provide examples on how to use these software to track projects, as users, teams and organizations. We document collaborative development…
50 Years of Data Science
- Computer Science
- 2017
A vision of data science is presented based on the activities of people who are “learning from data,” and an academic field dedicated to improving that activity in an evidence-based manner is described, being able to accommodate the same short-term goals.
Infrastructure and Tools for Teaching Computing Throughout the Statistical Curriculum
- EducationPeerJ Prepr.
- 2017
The computational infrastructure and toolkit choices to allow for these pedagogical innovations while minimizing frustration and improving adoption for both students and instructors are discussed.
R: A language and environment for statistical computing.
- Computer Science
- 2014
Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice…
bookdown: Authoring Books and Technical Documents with R Markdown
- Computer Science
- 2016
It is argued throughout this review that there are some substantial technical leaps that still need to be made to get authors to use bookdown on a reasonably sized book, but that it immediately solves the problem of frequent dealing with LATEX typesetting issues by focusing on the HTML version of books instead.