Using Jupyter for Reproducible Scientific Workflows

@article{Beg2021UsingJF,
  title={Using Jupyter for Reproducible Scientific Workflows},
  author={Marijan Beg and Juliette Taka and Thomas Kluyver and Alexander Konovalov and Min Ragan-Kelley and Nicolas M. Thi{\'e}ry and Hans Fangohr},
  journal={Computing in Science \& Engineering},
  year={2021},
  volume={23},
  pages={36-46}
}
Literate computing has emerged as an important tool for computational studies and open science, with growing folklore of best practices. In this work, we report two case studies—one in computational magnetism and another in computational mathematics—where domain-specific software was exposed to the Jupyter environment. This enables high level control of simulations and computation, interactive exploration of computational results, batch processing on HPC resources, and reproducible workflow… 

Figures from this paper

GSTools v1.3: A toolbox for geostatistical modelling in Python
TLDR
GSTools is a Python-based software suite for solving a wide range of geostatistical problems that provides methods for generating random fields, it can perform kriging and variogram estimation and much more, and is demonstrated by virtue of a series of example application detailing their use.
Automated Eruption Forecasting at Frequently Active Volcanoes Using Bayesian Networks Learned From Monitoring Data and Expert Elicitation: Application to Mt Ruapehu, Aotearoa, New Zealand
Volcano observatory best practice recommends using probabilistic methods to forecast eruptions to account for the complex natural processes leading up to an eruption and communicating the inherent
Efficient isolation of rare B cells using next-generation antigen barcoding
TLDR
A streamlined method for isolation and analysis of large numbers of antigen-specific B cells, including next generation antigen barcoding and an integrated computational framework for B cell multi-omics is presented.
SWIRRL. Managing Provenance-aware and Reproducible Workspaces
TLDR
A Web API that allows Virtual Research Environments to easily integrate such tools in their websites and re-purpose them to their users, and is built in cooperation with two research infrastructures in the field of solid earth science and climate data modeling.
HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction
TLDR
A holistic CWFR approach towards ML applications in weather and climate, focusing on HPC and big data is envisions, which envisages the raster datacube to provide data harmonization and fast and scalable data access and the Juypter notebook as a single reproducible experiment.
Cloud-based framework for inter-comparing submesoscale-permitting realistic ocean models
TLDR
This paper presents a meta-modelling framework for estimating the temperature and %VR of the response of the Southern Ocean to major volcanic eruptions in the period of June 21 to July 21, 1997.
Eliciting Best Practices for Collaboration with Computational Notebooks
TLDR
A catalog of best practices for collaborative data science with computational notebooks is elicited and envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototyping over writing code of quality.
Making Canonical Workflow Building Blocks Interoperable across Workflow Languages
TLDR
It is argued such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research (CWFR) in order to improve widespread adoption and reuse of computational methods across workflow language barriers.
The interkingdom horizontal gene transfer in 44 early diverging fungi boosted their metabolic, adaptive and immune capabilities
TLDR
It is shown that ancestrally aquatic fungi are generally more likely to acquire foreign genetic material than terrestrial ones and how different fungal lineages vary in terms of the number of xenologs, what are their ecological associations, and the molecular properties of proteins encoded by the acquired genes is studied.
Ubermag: Toward More Effective Micromagnetic Workflows
TLDR
A human-centered research environment called Ubermag is designed and developed that can be extended to drive other micromagnetic packages from the same environment, and the complete simulation workflow, including definition, execution, and data analysis of simulation runs, can be performed within the same notebook environment.
...
...

References

SHOWING 1-10 OF 22 REFERENCES
User interfaces for computational science: A domain specific language for OOMMF embedded in Python
TLDR
A domain specific language for micromagnetics that is embedded in the Python language, and allows users to define the micromagnetic simulations they want to carry out in a flexible way and is implemented together with a computational backend that executes the simulation task using the Object Oriented MicroMagnetic Framework.
The atomic simulation environment-a Python library for working with atoms.
TLDR
The atomic simulation environment (ASE) provides modules for performing many standard simulation tasks such as structure optimization, molecular dynamics, handling of constraints and performing nudged elastic band calculations.
Jupyter Notebooks - a publishing format for reproducible computational workflows
TLDR
Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable, is presented.
Binder 2.0 - Reproducible, interactive, sharable environments for science at scale
TLDR
Several of the design decisions and goals that went into the development of the current generation of Binder are detailed.
Stable and manipulable Bloch point
TLDR
The results introduce a stable and manipulable Bloch point to the collection of particle-like state candidates for the development of future spintronic devices.
Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks
TLDR
This research presents a novel and scalable approach called “Smart Towns” to solve the challenge of integrating bioinformatics and data science into the design and engineering of smart devices.
Groups St Andrews 1997 in Bath, I: A polynomial-time theory of black box groups I
TLDR
It is demonstrated that the “nonabelian normal structure” of matrix groups over finite fields can be mapped out in great detail by polynomial-time randomized (Monte Carlo) algorithms.
Frequency-based nanoparticle sensing over large field ranges using the ferromagnetic resonances of a magnetic nanodisc
TLDR
It is shown that particles can generate shifts in the resonant frequency of the disc's fundamental mode which exceed resonance linewidths in recently studied spin torque oscillator devices and can be maintained over large field ranges.
Adjoint representations of black box groups PSL2(Fq)
and D
  • G. Porter, “OOMMF User’s Guide, Version 1.0”, Interagency Report NISTIR 6376 National Institute of Standards and Technology, Gaithersburg, MD,
  • 1999
...
...