## J. Comput. Phys

- G Chen, L Chacón, D C Barnes
- J. Comput. Phys
- 2012

- Published 2012

Recently, an implicit, nonlinearly consistent, energyand charge-conserving particle-incell method has been proposed for multi-scale, full-f kinetic electrostatic simulations [1]. The method employs a Jacobian-free Newton–Krylov (JFNK) solver, capable of using very large timesteps of field evolution without loss of numerical stability or accuracy. A fundamental feature of the method is the nonlinear elimination of particle quantities via particle enslavement, so that particle-orbit computations are segregated from the field solver, while remaining fully self-consistent. This, in turn, enables the effective use of GPU (graphics processing unit) computing for the particle push step. The particle-orbit integration is critically important for both the accuracy and efficiency of the whole algorithm. In this talk, we present two novel implicit particle movers that enforce discrete charge conservation exactly and automatically. The first one employs a finitedifference adaptive Crank-Nicolson scheme [1], which is ideally suited for GPU computing [2]. The second one computes particle orbits analytically for a given piece-wise linear electric field [3], thus avoiding the need for adaptivity. Both particle movers feature no numerical dissipation, allowing the overall algorithm to be exactly energy-conserving. However, the analytical mover is shown to significantly enhance the robustness of the overall nonlinear solution algorithm for small particle numbers and large time steps. We have implemented an efficient, mixed-precision hybrid CPU–GPU implementation of the 1D implicit PIC algorithm exploiting the potential of the Crank-Nicolson implicit particle mover [2]. The JFNK solver is kept on the CPU in double precision (DP), and the CrankNicolson particle mover is implemented on a GPU using CUDA in single-precision (SP). Performance-oriented optimizations are introduced with the aid of the roofline model [4]. The Crank-Nicolson particle mover is shown to achieve up to 400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU efficiency against the peak theoretical performance, and is about 100 times faster than an equivalent single-core CPU (Intel Xeon X5460) compiler-optimized execution. For a challenging long-timescale ion acoustic wave simulation, the mixed-precision hybrid CPU–GPU solver is shown to over-perform the DP CPU-only serial version by a factor of about 100, without apparent loss of robustness or accuracy.

@inproceedings{Chen2012EfficientOI,
title={Efficient orbit integration in fully implicit particle-in-cell algorithms},
author={Guangye Chen and Luis Chac{\'o}n and D. C. Barnes},
year={2012}
}