Corpus ID: 235368350

LLAMA: The Low Level Abstraction For Memory Access

  title={LLAMA: The Low Level Abstraction For Memory Access},
  author={Bernhard Gruber and G. Amadio and J. Blomer and Alexander Matthes and R. Widera and M. Bussmann},
The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is therefore ideally decoupled from the rest of a program. This can be accomplished via a zero-runtime-overhead abstraction layer, underneath which memory layouts can be… Expand

Figures and Tables from this paper


Abstraction for AoS and SoA layout in C
This chapter presents an abstraction layer that allows switching between the AoS and SoA layouts in C++ without having to change the data access syntax, and becomes independent of the data layout and performance is improved by choosing the correct layout for the application's usage pattern. Expand
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Kokkos’ abstractions are described, its application programmer interface (API) is summarized, performance results for unit-test kernels and mini-applications are presented, and an incremental strategy for migrating legacy C++ codes to Kokkos is outlined. Expand
Data Layout and SIMD Abstraction Layers: Decoupling Interfaces from Implementations
A lightweight C++ template-based framework to provide the high level representation most programmers use (AoS) on different data layouts fitted for SIMD vectorization, and it is shown that combining this approach with Boost.SIMD/bSIMD libraries ensures a similar performance as with a manual vectorization using intrinsics, and in almost all cases better performance than with automatic vectorization without increasing the code complexity. Expand
Tearing Down the Memory Wall
A vision for the Erudite architecture that redefines the compute and memory abstractions such that memory bandwidth and capacity become first-class citizens along with compute throughput, tearing down the notorious memory wall that has plagued computer architecture for generations. Expand
Alpaka -- An Abstraction Library for Parallel Kernel Acceleration
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model that allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator. Expand
A Survey of Different Approaches for Overcoming the Processor - Memory Bottleneck
A brief review of various memorycentric systems that implement different approaches of merging or placing the memory near to the processing elements and a deep analysis of several well-known memory-centric systems are given. Expand
RAJA: Portable Performance for Large-Scale Scientific Applications
  • D. A. Beckingsale, T. Scogland, +7 authors Brian S. Ryujin
  • Computer Science
  • 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
  • 2019
RAJA is described, a portability layer that enables C++ applications to leverage various programming models, and thus architectures, with a single-source codebase, and preliminary results using RAJA are described. Expand
Vc: A C++ library for explicit vectorization
The Vc library provides portability of the source code, allowing full utilization of the hardware's SIMD capabilities, without introducing any overhead, and was designed to support developers in the creation of portable vectorized code. Expand
MultiArray: a C++ library for generic programming with arrays
The MultiArray library, a part of the Boost library collection, enhances a C++ programmer's tool set with versatile multi‐dimensional array abstractions that support idiomatic array operations and interoperate with C++ Standard Library containers and algorithms. Expand
mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards
This paper describes the design and implementation of mdspan, a proposed C++ standard multidimensional array view (planned for inclusion in C++23), and lays out how the design addresses some of the core challenges of performance-portable programming, and how its cus- tomization points allow a seamless extension into areas not currently addressed by the C++ Standard. Expand