REANA: A System for Reusable Research Data Analyses

@article{Simko2019REANAAS,
  title={REANA: A System for Reusable Research Data Analyses},
  author={Tibor Simko and Lukas Heinrich and Harri Hirvonsalo and Dinos Kousidis and Diego Rodr{\'i}guez Rodr{\'i}guez},
  journal={EPJ Web of Conferences},
  year={2019}
}
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place. REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a… 
Support for HTCondor high-Throughput Computing Workflows in the REANA Reusable Analysis Platform
TLDR
The results show that the REANA platform would be able to support hybrid scientific workflows where different parts of the analysis pipelines can be executed on multiple computing backends.
Enabling Seamless Execution of Computational and Data Science Workflows on HPC and Cloud with the Popper Container-native Automation Engine
TLDR
Popper is presented, a container-native workflow engine that does not assume the presence of a Kubernetes cluster or any service other than a container engine such as Docker or Podman, enabling users to focus only on writing workflow logic.
Ten simple rules for writing Dockerfiles for reproducible data science
TLDR
A set of rules to help researchers write understandable Dockerfiles for typical data science workflows are presented and researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows.
CERN Analysis Preservation and Reuse Framework: FAIR research data services for LHC experiments
TLDR
The importance of annotating the deposited content with high-level structured information about physics concepts in order to promote information discovery and knowledge sharing inside the collaboration is discussed.
Hybrid analysis pipelines in the REANA reproducible analysis platform
TLDR
The present work introduces support for hybrid analysis workflows in the REANA reproducible analysis platform and paves the way towards studying underlying performance advantages and challenges associated with hybrid analysis patterns in complex particle physics data analyses.
Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
TLDR
It is argued that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.
Making Reproducible Research Data by Utilizing Persistent ID Graph Structure
TLDR
A system that can reproduce data analysis and research environment of researchers who published existing papers is designed and utilizes a PID graph that connects papers, research data, and software.
Building a Kubernetes infrastructure for CERN’s Content Management Systems
TLDR
This work designed a new Web Frameworks platform by extending Kubernetes to replace the ageing physical infrastructure and reduce the dependency on homebrew components and presents the new system’s open-source design contrasted with the one it replaces, demonstrating how it drastically reduced its technical debt.
Science Capsule - Capturing the Data Life Cycle
TLDR
Reconcibility of scientific data and workflows facilitates efficient processing and analyses and a key to enabling reproducibility is to capture the end-to-end workflow life cycle, and any contextual metadata and provenance.
Neuroscience Cloud Analysis As a Service
TLDR
NeuroCAAS is a fully automated analysis platform that makes state-of-the-art data analysis tools accessible to the neuroscience community and removes barriers to fast, efficient cloud computation, which can dramatically accelerate both the dissemination and the effective use of cutting-edge analysis tools for neuroscientific discovery.
...
1
2
3
...

References

SHOWING 1-10 OF 10 REFERENCES
Common Workflow Language, v1.0
TLDR
The Common Workflow Language (CWL) is designed to express workflows for data-intensive science, such as Bioinformatics, Medical Imaging, Chemistry, Physics, and Astronomy.
Workflow Patterns
TLDR
A number of workflow patterns addressing what the authors believe identify comprehensive workflow functionality are described, providing the basis for an in-depth comparison of a number of commercially availablework flow management systems.
Yadage and Packtivity - analysis preservation using parametrized workflows
TLDR
This work argues for a declarative description in terms of individual processing steps - packtivities - linked through a dynamic directed acyclic graph (DAG) and presents an initial set of JSON schemas for such a description and an implementation capable of executing workflows of analysis preserved via Linux containers.
Status and Future Evolution of the ATLAS Offline Software
These proceedings give a summary of the many software upgrade projects undertaken to prepare ATLAS for the challenges of Run-2 of the LHC. Those projects include a significant reduction of the CPU
Status Report of the DPHEP Study Group: Towards a Global Effort for Sustainable Data Preservation in High Energy Physics
TLDR
An analysis of the research case for data preservation and a detailed description of the various projects at experiment, laboratory and international levels are provided and a concrete proposal for an international organisation in charge of the data management and policies in high-energy physics is provided.
HEP Software Foundation Community White Paper Working Group - Data and Software Preservation to Enable Reuse
In this chapter of the High Energy Physics Software Foundation Community Whitepaper, we discuss the current state of infrastructure, best practices, and ongoing developments in the area of data and
The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements
TLDR
The main conclusion is that users are discouraged to update to a new major release of either FreeSurfer or operating system or to switch to a different type of workstation without repeating the analysis; results thus give a quantitative support to successive recommendations stated by FreeSurf developers over the years.
Open is not enough
The solutions adopted by the high-energy physics community to foster reproducible research are examples of best practices that could be embraced more widely. This first experience suggests that
Search for supersymmetry in final states with missing transverse momentum and multiple b-jets in proton-proton collisions at s=13$$ \sqrt{s}=13 $$ TeV with the ATLAS detector
A bstractA search for supersymmetry involving the pair production of gluinos decaying via third-generation squarks into the lightest neutralino χ˜10$$ \left({\tilde{\chi}}_1^0\right) $$ is reported.
1,500 scientists lift the lid on reproducibility