GPU Algorithms for Efficient Exascale Discretizations

  title={GPU Algorithms for Efficient Exascale Discretizations},
  author={Ahmad Abdelfattah and Valeria Barra and Natalie N. Beams and Ryan C. Bleile and Jed Brown and Sylvain Camier and Robert Carson and Noel Chalmers and Veselin A. Dobrev and Yohann Dudouit and Paul Fischer and Ali Karakus and Stefan Kerkemeier and Tzanio V. Kolev and Yu-Hsiang Lan and Elia Merzari and Misun Min and Malachi Phillips and Thilina Rathnayake and Robert N. Rieben and Thomas Stitt and Ananias Tomboulides and Stanimire Tomov and Vladimir Z. Tomov and Arturo Vargas and Tim Warburton and Kenneth Weiss},
  journal={Parallel Comput.},
In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED… 
Highly Optimized Full-Core Reactor Simulations on Summit
Nek5000/RS is a highly-performant open-source spectral element code for simulation of incompressible and low-Mach fluid flow, heat transfer, and combustion with a particular focus on turbulent flows
Entropy Stable Discontinuous Galerkin Methods for Balance Laws in Non-Conservative Form: Applications to Euler with Gravity
A semi-discretely entropy stable discontinuous Galerkin method on curvilinear meshes is developed using a generalization of flux differencing for numerical fluxes in fluctuation form.


Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
A hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000, is presented, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures.
Portable high-order finite element kernels I: Streaming Operations
This paper proposes a suite of new Benchmark Streaming tests to focus on the distinct streaming operations which must be performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements.
OpenACC acceleration for the PN-PN-2 algorithm in Nek5000
An OpenACC implementation is applied to the CFD code Nek5000 for simulation of incompressible flows, based on the spectral-element method for the spatial discretization of the Navier–Stokes equations.
Efficient exascale discretizations: High-order finite element methods
  • T. Kolev, P. Fischer, +27 authors V. Tomov
  • Computer Science, Mathematics
    The International Journal of High Performance Computing Applications
  • 2021
This research presents an efficient exploitation of exascale architectures by rethinking of the numerical algorithms used in many large-scale applications and proposes a new approach called “Smart Cassandra” for this task.
NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver
The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described, and performance results on several platforms are presented, including scaling to 27,648 V100s on OLCF Summit.
An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication
Performance results and an analysis of a message passing interface/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations show more than 2.5× speedup over central processing unit-only performance on the same number of nodes.
High-Performance Tensor Contractions for GPUs
To accelerate large scale tensor-formulated high-order finite element method (FEM) simulations, which is the main focus and motivation for this work, this work represents contractions as tensor index reordering plus matrix-matrix multiplications (GEMMs).
Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors
A GPU parallelization of a matrix-free geometric multigrid iterative solver targeting moderate and high polynomial degrees, with support for general curved and adaptively refined hexahedral meshes with hanging nodes is developed.
Initial Guesses for Sequences of Linear Systems in a GPU-Accelerated Incompressible Flow Solver
New initial guess methods based on stabilized polynomial extrapolation are proposed and compared to the projection method of Fischer, showing that they are generally competitive with projection schemes despite requiring only half the storage and performing considerably less data movement and communication.
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs
New GPU implementations of the tensor contractions arising from basis-related computations for high-order finite element methods are presented and a tuned framework for choosing standard batch-BLAS GEMMs is developed that will maximize performance across groups of elements.