Good enough practices in scientific computing

@article{Wilson2016GoodEP,
  title={Good enough practices in scientific computing},
  author={Greg Wilson and Jennifer Bryan and Karen A. Cranston and Justin Kitzes and Lex Nederbragt and Tracy K. Teal},
  journal={PLoS Computational Biology},
  year={2016},
  volume={13}
}
Author summary Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for… 

The importance of good coding practices for data scientists

Key aspects of coding practices (both good and bad), focusing primarily on the R language, are described, though similar standards are applicable to other software environments.

Short-format Workshops Build Skills and Confidence for Researchers to Work with Data

Results show these two-day coding workshops increase researchers’ daily programming usage, and sixty-five percent of respondents have gained confidence in working with data and open source tools as a result of completing the workshop.

Approachable Case Studies Support Learning and Reproducibility in Data Science: An Example from Evolutionary Biology

ABSTRACT Research reproducibility is essential for scientific development. Yet, rates of reproducibility are low. As increasingly more research relies on computers and software, efforts for improving

Principles for data analysis workflows

Suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data- intensive work.

Exchanging Best Practices for Supporting Computational and Data-Intensive Research, The Xpert Network

The Xpert Network initiative where participants exchange best practices, tools, successes, challenges, and general information about their activities, leading to increased productivity, efficiency, and coordination in the ever-growing community of scientists that use computational and data-intensive research methods.

Best Practices in Structuring Data Science Projects

  • J. Rybicki
  • Computer Science
    Advances in Intelligent Systems and Computing
  • 2018
This paper surveys three sources of information on how to structure projects: common management methodologies, community best practices, and data sharing platforms and provides hints on tools that can be helpful for managing such structures in an efficient manner.

Excuse Me, Do You Have a Moment to Talk About Version Control?

  • J. Bryan
  • Computer Science
    PeerJ Prepr.
  • 2017
The use of the version control system Git and the hosting site GitHub for statistical and data scientific workflows are described, with special attention given to projects that use the statistical language R and, optionally, R Markdown documents.

Documenting research software in engineering science

The hypothesis that scientists do document but do not know exactly what they need to document, why, and for whom is addressed, and the big picture of what documentation of research software means is missing.

Towards computational reproducibility: researcher perspectives on the use and sharing of software

It is found that researchers create, use, and share software in a wide variety of forms for a wide range of purposes, including data collection, data analysis, data visualization, data cleaning and organization, and automation.

Key Attributes of a Modern Statistical Computing Tool

A modern statistical computing tool should be accessible, provide easy entry, privilege data as a first-order object, support exploratory and confirmatory analysis, allow for flexible plot creation, support randomization, be interactive, include inherent documentation, support narrative, publishing, and reproducibility, and be flexible to extensions.
...

References

SHOWING 1-10 OF 39 REFERENCES

A Quick Guide to Organizing Computational Biology Projects

The purpose of this article is to describe one good strategy for carrying out computational experiments, and to focus on relatively mundane issues such as organizing files and directories and documenting progress.

Ten Simple Rules for Reproducible Computational Research

It is emphasized that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducible can also be a burden for you as an individual researcher.

Ten Simple Rules for Digital Data Storage

“Data curation practices must continue to keep pace with the changes brought about by new forms and practices of data collection and storage,” according to the authors.

Clean Code - a Handbook of Agile Software Craftsmanship

Noted software expert Robert C. Martin, who has helped bring agile principles from a practitioners point of view to tens of thousands of programmers, has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code on the fly into a book that will instill within you the values of software craftsman.

Code complete - a practical handbook of software construction, 2nd Edition

This book focuses on programming technique rather than the requirements of a specific programming language or environment, and Topics include: front-end planning, applying good design techniques to construction, using data effectively, using common and advanced control structures, secrets of self-documenting code, testing and debugging techniques.

The Checklist Manifesto: How to Get Things Right

Understanding Open Source and Free Software Licensing

This concise guide focuses on annotated licenses, offering an in-depth explanation of how they compare and interoperate, and how license choices affect project possibilities.

Nine simple ways to make it easier to (re)use your data

Nine simple ways to make it easy to reuse the data that you share and also make it easier to work with it yourself to allow reuse and to help you to understand and use the data.

Best Practices for Scientific Computing

We describe a set of best practices for scientific software development, based on research and experience, that will improve scientists' productivity and the reliability of their software.

Code and Data for the Social Sciences: A Practitioner's Guide

Test whether per capita potato chip consumption in a county is correlated with the average per capita Potato chip consumption among other counties in the same state to eliminate redundancy and improve clarity.