Workflows Community Summit: Bringing the Scientific Workflows Community Together

  title={Workflows Community Summit: Bringing the Scientific Workflows Community Together},
  author={Rafael Ferreira da Silva and Henri Casanova and Kyle Chard and Daniel E. Laney and Dong H. Ahn and Shantenu Jha and Carole A. Goble and Lavanya Ramakrishnan and Luc Peterson and Bjoern Enders and Douglas Thain and Ilkay Altintas and Y. Babuji and Rosa M. Badia and Vivien Bonazzi and Tain{\~a} Coleman and Michael R. Crusoe and Ewa Deelman and Frank Di Natale and Paolo Di Tommaso and Thomas Fahringer and Rosa Filgueira and Grigori Fursin and Alex M. Ganose and Bjorn Gruning and Daniel S. Katz and Olga Anna Kuchar and Ana Kupresanin and Bertram Lud{\"a}scher and Ketan Maheshwari and Marta Mattoso and Kshitij Mehta and Todd S. Munson and Jonathan Ozik and Tom Peterka and Lo{\"i}c Pottier and Timothy Randles and Stian Soiland-Reyes and Benjam{\'i}n Tovar and Matteo Turilli and Thomas D. Uram and Karan Vahi and Michael Wilde and Matthew Wolf and Justin M. Wozniak},
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity… 

Figures and Tables from this paper

Workflows Community Summit: Tightening the Integration between Computing Facilities and Scientific Workflows
The third edition of the ``Workflows Community Summit" explored workflows challenges and opportunities from the perspective of computing centers and facilities.
WfChef: Automated Generation of Accurate Scientific Workflow Generators
It is found that the WfChef generators not only require zero development effort (because it is automatically produced), but also generate workflows that are more realistic than those generated by hand-crafted generators.
ExaWorks: Workflows for Exascale
The ExaWorks project is leading a co-design process to create a workflow Software Development Toolkit (SDK) consisting of a wide range of workflow management tools that can be composed and interoperate through common interfaces.
A Community Roadmap for Scientific Workflows Research and Development
This paper reports on discussions and findings from two virtual “Workflows Community Summits” (January and April, 2021) to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision.
Implementation-independent Knowledge Graph Construction Workflows using FnO Composition
This paper introduces an interoperable and reproducible solution for defining Knowledge Graph construction workflows leveraging Semantic Web technologies, and demonstrates that composing functions using the Function Ontology allows for functional descriptions of entire workflows, automatically executable using a Function Ontological Handler implementation.
High-Performance Ptychographic Reconstruction with Federated Facilities
This work presents a system that unifies leadership computing and experimental facilities by enabling the automated establishment of data analysis pipelines that extend from edge data acquisition systems at synchrotron beamlines to remote computing facilities; under the covers, the system uses Globus Auth authentication to minimize user interaction.
Why it takes a village to manage and share data
Implementation plans for the National Institutes of Health policy for data management and sharing, which takes effect in 2023, provide an opportunity to reflect on the stakeholders, infrastructures,
User Experiences on Network Testbeds
Two surveys are administered to investigate and document possible obstacles in user interaction with network testbeds and show that most users overcome their initial orientational obstacles, but that implementational and domain-specific obstacles remain and should be addressed by test beds through significant new developments.


The future of scientific workflows
This work highlights use cases, computing systems, workflow needs, and concludes by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.
Workflow Patterns
A number of workflow patterns addressing what the authors believe identify comprehensive workflow functionality are described, providing the basis for an in-depth comparison of a number of commercially availablework flow management systems.
A common workflow registry of compute endpoints and applications
This document focuses on building shared elements that will increasingly be used as workflow elements in simulation, analysis, search, optimization, and parameter study research campaigns.
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs
A number of industrial projects are presented in which the modular CK approach was successfully validated in order to automate benchmarking, auto-tuning and co-design of efficient software and hardware for machine learning (ML) and artificial intelligence (AI) in terms of speed, accuracy, energy, size and various costs.
FAIR Computational Workflows
This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research
This paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembled of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales.
From desktop to Large-Scale Model Exploration with Swift/T
A framework for combining existing capabilities for model exploration approaches and simulations with the Swift/T parallel scripting language to run scientific workflows on a variety of computing resources, from desktop to academic clusters to Top 500 level supercomputers is presented.
The FAIR Guiding Principles for scientific data management and stewardship
This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide
This work presents a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories, and believes that putting in place specific policies such as those presented here will help scientific software registrants better serve their users and their disciplines.