Performance Measurements Within Asynchronous Task-Based Runtime Systems: A Double White Dwarf Merger as an Application

  title={Performance Measurements Within Asynchronous Task-Based Runtime Systems: A Double White Dwarf Merger as an Application},
  author={Patrick Diehl and Dominic Marcello and Parsa Armini and Hartmut Kaiser and Sagiv Shiber and Geoffrey C. Clayton and Juhan Frank and Gregor Dai{\ss} and D. C. Pfluger and David C. Eder and Alice E. Koniges and Kevin A. Huck},
  journal={Computing in Science \& Engineering},
Analyzing performance within asynchronous many-task-based runtime systems is challenging because millions of tasks are launched concurrently. Especially for long-term runs, the amount of data collected becomes overwhelming. We study HPX and its performance-counter framework and autonomic performance environment for Exascale to collect performance data and energy consumption. We added HPX application-specific performance counters to the Octo-Tiger full 3-D adaptive multigrid code astrophysics… 

Figures and Tables from this paper

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation
This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical cluster approximation) application that targets the new ARM A64FX processor and aims to automatize parts of the efforts in the OpenMP Advisor tool, which is built on top of existing and newly introduced LLVM tooling.
Octo-Tiger’s New Hydro Module and Performance Using HPX+CUDA on ORNL’s Summit
This work remodeled Octo-Tiger’s hydro solver to use a three-dimensional reconstruction scheme, and ported the hydrosolver to GPU using CUDA kernels.


Assessing the Performance Impact of using an Active Global Address Space in HPX: A Case for AGAS
  • P. Amini, H. Kaiser
  • Computer Science
    2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)
  • 2019
This research presents a method to assess the performance of AGAS and the amount of impact it has on the execution time of the Octo-Tiger application, and identifies the four most expensive AGAS operations in HPX.
Octo-Tiger: A new, 3D hydrodynamic code for stellar mergers that uses HPX parallelisation,”Monthly Notices of the Royal Astronomical Society, Apr. 2021, doi: 10.1093/mnras/ stab937
  • 2021
octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization
OCTO-TIGER is an astrophysics code to simulate the evolution of self-gravitating and rotat-ing systems of arbitrary geometry based on the fast multipole method, using adaptive mesh refinement.
Implementation of Peridynamics utilizing HPX - the C++ standard library for parallelism and concurrency
This paper presents a peridynamics EMU nodal discretization implementation with the C++ Standard Library for Concurrency and Parallelism (HPX), an open source asynchronous many task run time system.
Coronae borealis stars with MESA,”Monthly
  • Notices Roy. Astron. Soc., vol. 488,
  • 2019
Evolving R Coronae Borealis stars with mesa
The R Coronae Borealis (RCB) stars are rare hydrogen-deficient, carbon-rich supergiants. They undergo extreme, irregular declines in brightness of many magnitudes due to the formation of thick
From Piz Daint 1 2 3 7 A PREPRINT - JUNE
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,
  • 2019
From piz daint to the stars: simulation of stellar mergers using high-level abstractions
Octo-Tiger, a finite volume grid-based hydrodynamics simulation code with Adaptive Mesh Refinement which is unique in conserving both linear and angular momentum to machine precision, is developed and extended to heterogeneous GPU-accelerated supercomputers, demonstrating node-level performance and portability.
High Perform
  • Comput., Netw., Storage Anal
  • 2019
Coupling Exascale Multiphysics Applications: Methods and Lessons Learned
A framework constructed by leveraging capabilities such as in-memory communications, workflow scheduling on HPC resources, and continuous performance monitoring that connects in situ or online analysis, compression, and visualization that accelerate the time between a run and the analysis of the science content is presented.