Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

@article{Vincent2022StrongSO,
  title={Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems},
  author={Jonathan Vincent and Jing Gong and Martin Karp and Adam Peplinski and Niclas Jansson and Artur Podobas and Andreas Jocksch and Jie Yao and Fazle Hussain and Stefano Markidis and Matts Karlsson and Dirk Pleiter and Erwin Laure and Philipp Schlatter},
  journal={International Conference on High Performance Computing in Asia-Pacific Region},
  year={2022}
}
  • J. Vincent, Jing Gong, P. Schlatter
  • Published 8 September 2021
  • Computer Science
  • International Conference on High Performance Computing in Asia-Pacific Region
We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers Reτ = 360 and Reτ = 550, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC’s… 

Figures from this paper

HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark

TLDR
HipBone is a fully GPU-accelerated C++ implementation of the original NekBone CPU proxy application with several novel algorithmic and implementation improvements which optimize its performance on modern finegrain parallel GPU accelerators.

References

SHOWING 1-10 OF 54 REFERENCES

On the Strong Scaling of the Spectral Element Solver Nek5000 on Petascale Systems

TLDR
The present work is targeted at performing a strong scaling study of the high-order spectral element fluid dynamics solver Nek5000, and quantifies the machine characteristics in order to better assess the scaling behaviors of the code.

OpenACC acceleration of the Nek5000 spectral element code

TLDR
A case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system, and profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations.

OpenACC acceleration for the PN-PN-2 algorithm in Nek5000

A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

TLDR
A hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000, is presented, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures.

Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics

TLDR
Neko is presented, a portable framework for high-order spectral element "ow simulations" that adopts a modern object-oriented approach, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors down to exotic vector processors and FPGAs.

NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver

TLDR
The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described, and performance results on several platforms are presented, including scaling to 27,648 V100s on OLCF Summit.

Scalability of high-performance PDE solvers

TLDR
This article considers a sequence of PDE-motivated bake-off problems designed to establish best practices for efficient high-order simulations across a variety of codes and platforms, and measures peak performance and identifies effective code optimization strategies for each architecture.

Optimization of Tensor-product Operations in Nekbone on GPUs

TLDR
This work optimization the main tensor-product operation in Nekbone further in CUDA and obtains 77 - 92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024 - 4096 elements and polynomial degree 9.
...