Good enough practices in scientific computing

@article{Wilson2017GoodEP,
  title={Good enough practices in scientific computing},
  author={G. Wilson and J. Bryan and K. Cranston and J. Kitzes and Lex Nederbragt and Tracy K. Teal},
  journal={PLoS Computational Biology},
  year={2017},
  volume={13}
}
Author summary Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for… Expand
Short-format Workshops Build Skills and Confidence for Researchers to Work with Data
TLDR
Results show these two-day coding workshops increase researchers’ daily programming usage, and sixty-five percent of respondents have gained confidence in working with data and open source tools as a result of completing the workshop. Expand
Excuse me, do you have a moment to talk about version control?
  • J. Bryan
  • Computer Science
  • PeerJ Prepr.
  • 2017
TLDR
The use of the version control system Git and and the hosting site GitHub for statistical and data scientific workflows are described and special attention is given to projects that use the statistical language R and, optionally, R Markdown documents. Expand
Best Practices in Structuring Data Science Projects
TLDR
This paper surveys three sources of information on how to structure projects: common management methodologies, community best practices, and data sharing platforms and provides hints on tools that can be helpful for managing such structures in an efficient manner. Expand
Excuse Me, Do You Have a Moment to Talk About Version Control?
TLDR
The use of the version control system Git and the hosting site GitHub for statistical and data scientific workflows are described, with special attention given to projects that use the statistical language R and, optionally, R Markdown documents. Expand
Building a local community of practice in scientific programming for life scientists
TLDR
It is believed that the current data deluge that life scientists face can benefit from the implementation of these small communities and a model on how to build such a community of practice at a local academic institution is proposed. Expand
Key Attributes of a Modern Statistical Computing Tool
TLDR
A modern statistical computing tool should be accessible, provide easy entry, privilege data as a first-order object, support exploratory and confirmatory analysis, allow for flexible plot creation, support randomization, be interactive, include inherent documentation, support narrative, publishing, and reproducibility, and be flexible to extensions. Expand
Ten Simple Rules for Reproducible Research in Jupyter Notebooks
TLDR
A set of rules is developed to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Expand
Building a local community of practice in scientific programming for Life Scientists
TLDR
A three-step field guide for building a local community of practice in scientific programming for life scientists, which believes that the current data deluge that life scientists will increasingly face can benefit from the implementation of these small communities. Expand
Exchanging Best Practices and Tools for Supporting Computational and Data-Intensive Research, The Xpert Network
TLDR
The paper describes the initiative – the Xpert Network – where participants exchange successes, challenges, and general information about their activities, leading to increased productivity, efficiency, and coordination in the ever-growing community of scientists that use computational and data-intensive research methods. Expand
Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge
Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published studyExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
A Quick Guide to Organizing Computational Biology Projects
TLDR
The purpose of this article is to describe one good strategy for carrying out computational experiments, and to focus on relatively mundane issues such as organizing files and directories and documenting progress. Expand
Ten Simple Rules for Reproducible Computational Research
TLDR
It is emphasized that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducible can also be a burden for you as an individual researcher. Expand
Ten Simple Rules for Digital Data Storage
TLDR
“Data curation practices must continue to keep pace with the changes brought about by new forms and practices of data collection and storage,” according to the authors. Expand
Clean Code - a Handbook of Agile Software Craftsmanship
TLDR
Noted software expert Robert C. Martin, who has helped bring agile principles from a practitioners point of view to tens of thousands of programmers, has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code on the fly into a book that will instill within you the values of software craftsman. Expand
Code complete - a practical handbook of software construction, 2nd Edition
TLDR
This book focuses on programming technique rather than the requirements of a specific programming language or environment, and Topics include: front-end planning, applying good design techniques to construction, using data effectively, using common and advanced control structures, secrets of self-documenting code, testing and debugging techniques. Expand
The Checklist Manifesto: How to Get Things Right
Today we find ourselves in possession of stupendous know-how, which we willingly place in the hands of the most highly skilled people. But avoidable failures are common, and the reason is simple: theExpand
Understanding Open Source and Free Software Licensing
If you've held back from developing open source or free software projects because you don't understand the implications of the various licenses, you're not alone. Many developers believe in releasingExpand
Best Practices for Scientific Computing
We describe a set of best practices for scientific software development, based on research and experience, that will improve scientists' productivity and the reliability of their software.
Code and Data for the Social Sciences: A Practitioner's Guide
TLDR
Test whether per capita potato chip consumption in a county is correlated with the average per capita Potato chip consumption among other counties in the same state to eliminate redundancy and improve clarity. Expand
The magical number seven plus or minus two: some limits on our capacity for processing information.
TLDR
The theory provides us with a yardstick for calibrating the authors' stimulus materials and for measuring the performance of their subjects, and the concepts and measures provided by the theory provide a quantitative way of getting at some of these questions. Expand
...
1
2
3
4
...