REANA: A System for Reusable Research Data Analyses
@article{Simko2019REANAAS, title={REANA: A System for Reusable Research Data Analyses}, author={Tibor Simko and Lukas Heinrich and Harri Hirvonsalo and Dinos Kousidis and Diego Rodr{\'i}guez Rodr{\'i}guez}, journal={EPJ Web of Conferences}, year={2019} }
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place.
REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a…
27 Citations
Support for HTCondor high-Throughput Computing Workflows in the REANA Reusable Analysis Platform
- Computer Science2019 15th International Conference on eScience (eScience)
- 2019
The results show that the REANA platform would be able to support hybrid scientific workflows where different parts of the analysis pipelines can be executed on multiple computing backends.
Enabling Seamless Execution of Computational and Data Science Workflows on HPC and Cloud with the Popper Container-native Automation Engine
- Computer Science2020 2nd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)
- 2020
Popper is presented, a container-native workflow engine that does not assume the presence of a Kubernetes cluster or any service other than a container engine such as Docker or Podman, enabling users to focus only on writing workflow logic.
Ten simple rules for writing Dockerfiles for reproducible data science
- Computer SciencePLoS Comput. Biol.
- 2020
A set of rules to help researchers write understandable Dockerfiles for typical data science workflows are presented and researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows.
CERN Analysis Preservation and Reuse Framework: FAIR research data services for LHC experiments
- Computer ScienceEPJ Web of Conferences
- 2020
The importance of annotating the deposited content with high-level structured information about physics concepts in order to promote information discovery and knowledge sharing inside the collaboration is discussed.
Hybrid analysis pipelines in the REANA reproducible analysis platform
- Computer ScienceEPJ Web of Conferences
- 2020
The present work introduces support for hybrid analysis workflows in the REANA reproducible analysis platform and paves the way towards studying underlying performance advantages and challenges associated with hybrid analysis patterns in complex particle physics data analyses.
Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
- Computer ScienceFrontiers in Big Data
- 2021
It is argued that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.
Making Reproducible Research Data by Utilizing Persistent ID Graph Structure
- Computer Science2020 IEEE International Conference on Big Data and Smart Computing (BigComp)
- 2020
A system that can reproduce data analysis and research environment of researchers who published existing papers is designed and utilizes a PID graph that connects papers, research data, and software.
Building a Kubernetes infrastructure for CERN’s Content Management Systems
- Computer ScienceEPJ Web of Conferences
- 2021
This work designed a new Web Frameworks platform by extending Kubernetes to replace the ageing physical infrastructure and reduce the dependency on homebrew components and presents the new system’s open-source design contrasted with the one it replaces, demonstrating how it drastically reduced its technical debt.
Science Capsule - Capturing the Data Life Cycle
- Computer Science, BiologyJ. Open Source Softw.
- 2021
Reconcibility of scientific data and workflows facilitates efficient processing and analyses and a key to enabling reproducibility is to capture the end-to-end workflow life cycle, and any contextual metadata and provenance.
Neuroscience Cloud Analysis As a Service
- Computer SciencebioRxiv
- 2020
NeuroCAAS is a fully automated analysis platform that makes state-of-the-art data analysis tools accessible to the neuroscience community and removes barriers to fast, efficient cloud computation, which can dramatically accelerate both the dissemination and the effective use of cutting-edge analysis tools for neuroscientific discovery.
References
SHOWING 1-10 OF 10 REFERENCES
Common Workflow Language, v1.0
- Computer Science
- 2016
The Common Workflow Language (CWL) is designed to express workflows for data-intensive science, such as Bioinformatics, Medical Imaging, Chemistry, Physics, and Astronomy.
Workflow Patterns
- Computer ScienceDistributed and Parallel Databases
- 2004
A number of workflow patterns addressing what the authors believe identify comprehensive workflow functionality are described, providing the basis for an in-depth comparison of a number of commercially availablework flow management systems.
Yadage and Packtivity - analysis preservation using parametrized workflows
- Computer Science
- 2017
This work argues for a declarative description in terms of individual processing steps - packtivities - linked through a dynamic directed acyclic graph (DAG) and presents an initial set of JSON schemas for such a description and an implementation capable of executing workflows of analysis preserved via Linux containers.
Status and Future Evolution of the ATLAS Offline Software
- Computer Science, Physics
- 2015
These proceedings give a summary of the many software upgrade projects undertaken to prepare ATLAS for the challenges of Run-2 of the LHC. Those projects include a significant reduction of the CPU…
Status Report of the DPHEP Study Group: Towards a Global Effort for Sustainable Data Preservation in High Energy Physics
- PhysicsArXiv
- 2012
An analysis of the research case for data preservation and a detailed description of the various projects at experiment, laboratory and international levels are provided and a concrete proposal for an international organisation in charge of the data management and policies in high-energy physics is provided.
HEP Software Foundation Community White Paper Working Group - Data and Software Preservation to Enable Reuse
- Computer Science, Physics
- 2018
In this chapter of the High Energy Physics Software Foundation Community Whitepaper, we discuss the current state of infrastructure, best practices, and ongoing developments in the area of data and…
The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements
- MedicinePloS one
- 2012
The main conclusion is that users are discouraged to update to a new major release of either FreeSurfer or operating system or to switch to a different type of workstation without repeating the analysis; results thus give a quantitative support to successive recommendations stated by FreeSurf developers over the years.
Open is not enough
- PhysicsNature Physics
- 2018
The solutions adopted by the high-energy physics community to foster reproducible research are examples of best practices that could be embraced more widely. This first experience suggests that…
Search for supersymmetry in final states with missing transverse momentum and multiple b-jets in proton-proton collisions at s=13$$ \sqrt{s}=13 $$ TeV with the ATLAS detector
- Physics
- 2018
A bstractA search for supersymmetry involving the pair production of gluinos decaying via third-generation squarks into the lightest neutralino χ˜10$$ \left({\tilde{\chi}}_1^0\right) $$ is reported.…