• Corpus ID: 239016001

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

  title={Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading},
  author={Jeffrey Kelling and Sergei Bastrakov and Alexander Debus and Thomas Kluge and Matthew Leinhauser and Richard Pausch and Klaus Steiniger and Jan Stephan and Ren{\'e} Widera and Jeff Young and Michael Bussmann and Sunita Chandrasekaran and Guido Juckeland},
HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the required portability, but, to existing codes, they themselves represent yet another API to port to. Here, we present our approach of porting the GPU… 

Figures and Tables from this paper


From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives
This paper discusses the experience with porting the SPEC ACCEL benchmarks from OpenACC to OpenMP 4.5 using a performance portable style that lets the compiler make platform-specific optimizations to achieve good performance on a variety of systems.
Alpaka -- An Abstraction Library for Parallel Kernel Acceleration
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model that allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator.
Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library
The general matrix multiplication (GEMM) algorithm is used in this example to prove that Alpaka allows for platform-specific tuning with a single source code and the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of AlPaka.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Kokkos’ abstractions are described, its application programmer interface (API) is summarized, performance results for unit-test kernels and mini-applications are presented, and an incremental strategy for migrating legacy C++ codes to Kokkos is outlined.
Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries
It is found that CUDA and OpenCL work equally well for problems of large sizes, while OpenCL has higher overhead for smaller problems.
std::tuple<> should be trivially constructible (May 2019)
  • 2019
Radiative signatures of the relativistic kelvin-helmholtz instability
  • Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. pp. 5:1–5:12. SC ’13, ACM, New York, NY, USA
  • 2013
OpenMP 5.1 API specification -atomic
ReadonOpenCompute for of llvm-project