Porting WarpX to GPU-accelerated platforms

  title={Porting WarpX to GPU-accelerated platforms},
  author={Andrew Myers and Ann S. Almgren and L{\'i}gia Diana Amorim and John B. Bell and Luca Fedeli and Lixin Ge and Kevin Gott and David P. Grote and M. J. Hogan and Axel Huebl and Revathi Jambunathan and R{\'e}mi Lehe and Cho Ng and M. Rowan and Olga Shapoval and Maxence Th{\'e}venet and Jean-Luc Vay and Henri Vincenti and E. Yang and N. Za{\"i}m and W. Zhang and Y. Zhao and Edoardo Zoni},

Figures and Tables from this paper

Modeling of advanced accelerator concepts
Computer modeling is essential to research on Advanced Accelerator Concepts (AAC), as well as to their design and operation. This paper summarizes the current status and future needs of AAC systems
PICSAR-QED: a Monte Carlo module to simulate Strong-Field Quantum Electrodynamics in Particle-In-Cell codes for exascale architectures
Physical scenarios where the electromagnetic fields are so strong that Quantum ElectroDynamics (QED) plays a substantial role are one of the frontiers of contemporary plasma physics research.
HiPACE++: a portable, 3D quasi-static Particle-in-Cell code
S. Diederichs, 2, 3, ∗ C. Benedetti, A. Huebl, R. Lehe, A. Myers, A. Sinn, J.-L. Vay, W. Zhang, and M. Thévenet Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany Lawrence
Probing Strong-Field QED with Doppler-Boosted Petawatt-Class Lasers.
The scheme relies on relativistic plasma mirrors curved by radiation pressure to boost the intensity of petawatt-class laser pulses by Doppler effect and focus them to extreme field intensities, and shows that very clear SF QED signatures could be observed by placing a secondary target where the boosted beam is focused.
libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations
Almost all applications stop scaling at some point; those that don't are seldom performant when considering time to solution on anything but aspirational/unicorn resources. Recognizing these


RAJA: Portable Performance for Large-Scale Scientific Applications
RAJA is described, a portability layer that enables C++ applications to leverage various programming models, and thus architectures, with a single-source codebase, and preliminary results using RAJA are described.
PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster
The simulation code PIConGPU presented in this paper is, to the authors' knowledge, the first scalable GPU cluster implementation of the PIC algorithm in plasma physics.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
A new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm is presented.
The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems
The design and key differences of the Summit and Sierra systems are discussed, and several CPU, network and memory bound analytics and GPU-bound deep learning codes achieve up to a 11X and 79X speedup/node, respectively over Titan.
Single-pass Parallel Prefix Scan with Decoupled Lookback
We describe a work-efficient, communication-avoiding, singlepass method for the parallel computation of prefix scan. When consuming input from memory, our algorithm requires only ~2n data movement: n
Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: A First Look at Knights Landing
The optimized version of the particle-in-cell plasma simulation code PICADOR achieves 100 GFLOPS double precision performance on a Knights Landing device with the speedups of 2.35 x compared to a 14-core Haswell CPU and 3.47 x comparedto a 61-core Knights Corner Xeon Phi.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Kokkos’ abstractions are described, its application programmer interface (API) is summarized, performance results for unit-test kernels and mini-applications are presented, and an incremental strategy for migrating legacy C++ codes to Kokkos is outlined.
Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
A methodology to construct a hierarchical Roofline on NVIDIA GPUs and extends it to support reduced precision and Tensor Cores and to analyze three proxy applications: GPP from BerkeleyGW, HPGMG from AMReX, and conv2d from TensorFlow.
AMReX: Block-structured adaptive mesh refinement for multiphysics applications
The core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects are discussed, including the strategy that the AM reX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.
Extended particle-in-cell schemes for physics in ultrastrong laser fields: Review and developments.
A modified event generator is proposed that precisely models the entire spectrum of incoherent particle emission without any low-energy cutoff, and which imposes close to the weakest possible demands on the numerical time step.