Bootstrapping In-Situ Workflow Auto-Tuning via Combining Performance Models of Component Applications

  title={Bootstrapping In-Situ Workflow Auto-Tuning via Combining Performance Models of Component Applications},
  author={Tong Shu and Yanfei Guo and Justin M. Wozniak and Xiaoning Ding and Ian T Foster and Tahsin M. Kurç},
  journal={SC21: International Conference for High Performance Computing, Networking, Storage and Analysis},
  • Tong ShuYanfei Guo T. Kurç
  • Published 16 August 2020
  • Computer Science
  • SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
In an in-situ workflow, multiple components such as simulation and analysis applications are coupled with streaming data transfers. The multiplicity of possible configurations necessitates an auto-tuner for workflow optimization. Existing auto-tuning approaches are computationally expensive because many configurations must be sampled by running the whole workflow repeatedly in order to train the auto-tuner surrogate model or otherwise explore the configuration space. To reduce these costs, we… 

Figures and Tables from this paper

HPC Storage Service Autotuning Using Variational- Autoencoder -Guided Asynchronous Bayesian Optimization

This work develops a novel variational-autoencoder-guided asynchronous Bayesian optimization method to tune HPC storage service parameters and shows that it is on par with state-of-the-art autotuning frameworks in speed and outperforms them in resource utilization and parallelization capabilities.

Distributed in-memory data management for workflow executions

SchalaDB is presented, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering and shown that even when running data analyses for user steering, SchalaDB’s overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data.

Serving unseen deep learning models with near-optimal configurations: a fast adaptive search approach

Experiments show that Falcon can effectively reduce the search overhead for unseen DL models by up to 80% compared to state-of-the-art efforts.

Software Monsters: Quantifying, Reporting, and Controlling Composite Applications

It is proposed that fundamental software metrics can be brought into innovative programming models to address the construction and execution of scientific applications.

Practical Federated Learning Infrastructure for Privacy-Preserving Scientific Computing

  • Lesi WangDongfang Zhao
  • Computer Science
    2022 IEEE/ACM International Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S)
  • 2022
This paper identifies three missing pieces of a scientific FL infrastructure: a native MPI programming interface that can be seamlessly integrated into existing scientific applications, an independent data layer for the FL system such that the user can pick the persistent medium for her own choice, and efficient encryption protocols that are optimized for scientific workflows.



In-situ workflow auto-tuning through combining component models

An in-situ workflow auto-tuning method, ALIC, which integrates machine learning techniques with knowledge of in-Situ workflow structures to enable automated workflow configuration with a limited number of performance measurements is proposed.

Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization

The effectiveness of HiPerBOt is demonstrated in tuning parameters that include compiler flags, runtime settings, and application-level options for several parallel codes, including, Kripke, Hypre, LULESH, and OpenAtom.

Active-learning-based surrogate models for empirical performance tuning

An iterative parallel algorithm is presented that builds surrogate performance models for scientific kernels and workloads on single-core and multicore and multinode architectures and tailor to the unique parallel environment an active learning heuristic popular in the literature on the sequential design of computer experiments in order to identify the code variants whose evaluations have the best potential to improve the surrogate.

Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling

A novel parameter-value selection heuristic is proposed, which functions as a guideline for the experiment design, leveraging sparse performance-modeling, a technique that only needs a polynomial number of experiments per model parameter.

Autotuning in High-Performance Computing Applications

If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures.

In‐memory staging and data‐centric task placement for coupled scientific simulation workflows

A distributed data sharing and task execution framework that co‐locates in‐memory data staging on application compute nodes to store data that needs to be shared or exchanged and uses data‐centric task placement to map computations onto processor cores that a large portion of the data exchanges can be performed using the intra‐node shared memory is presented.

Minimizing the cost of iterative compilation with active learning

This work construct 11 high-quality models which use a combination of optimization settings to predict the runtime of benchmarks from the SPAPT suite, and is able to reduce the training overhead by up to 26x compared to an approach with a fixed number of sample runs.

DataSpaces: an interaction and coordination framework for coupled simulation workflows

DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow and enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and Services via the space using semantically meaningful operators.

Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up

This paper targets an important class of applications that requires combining HPC simulations with data analysis for online or real-time scientific discovery, and designs an end-to-end application-level approach to eliminating the interlocks and synchronizations existent in the present methods.

Bootstrapping Parameter Space Exploration for Fast Tuning

This paper proposes a novel bootstrap scheme, called GEIST, for parameter space exploration to find performance-optimizing configurations quickly and shows the effectiveness of GEIST for selecting application input options, compiler flags, and runtime/system settings for several parallel codes.