Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

  title={Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives},
  author={Janus Juul Eriksen},
  journal={Molecular Physics},
  pages={2086 - 2101}
  • J. J. Eriksen
  • Published 26 September 2016
  • Computer Science
  • Molecular Physics
ABSTRACT It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order Møller-Plesset (MP2) model in its resolution-of-the-identity approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the… 
13 Citations
Integral-direct and parallel implementation of the CCSD(T) method: algorithmic developments and large-scale applications.
The efficiency of this implementation allowed us to perform some of the largest CCSD(T) calculations ever presented for systems of 31-43 atoms and 1037-1569 orbitals using only 4-8 many-core CPUs and 1-3 days of wall time.
Accurate Reduced-Cost CCSD(T) Energies: Parallel Implementation, Benchmarks, and Large-Scale Applications
The accurate and systematically improvable frozen natural orbital (FNO) and natural auxiliary function (NAF) cost-reducing approaches are combined with recent coupled-cluster singles, doubles, and perturbative triples implementations to create the practically “gold standard” quality FNO-CCSD(T) method.
GPU acceleration of rank-reduced coupled-cluster singles and doubles.
A graphical processing unit (GPU) accelerated implementation of the recently introduced rank-reduced coupled-cluster singles and doubles method, which introduces a low-rank approximation of the doubles amplitudes, and test the accuracy of RR-CCSD for a variety of chemical systems, finding that accuracy to better than 0.1% error in the correlation energy can be achieved.
Many-Body Quantum Chemistry on Massively Parallel Computers.
The deployment of many-body quantum chemistry methods onto massively parallel high-performance computing (HPC) platforms is reviewed. The particular focus is on highly accurate methods that have
Performance of Coupled-Cluster Singles and Doubles on Modern Stream Processing Architectures.
We develop a new implementation of coupled-cluster singles and doubles (CCSD) optimized for the most recent graphical processing unit (GPU) hardware. We find that a single node with 8 NVIDIA V100
Energy correction and analytic energy gradients due to triples in CCSD(T) with spin–orbit coupling on graphic processing units using single-precision data
ABSTRACT Calculating the contribution of triples ((T)) to the correlation energy, the density matrices and the constant terms in the Λ equation is the most expensive steps in obtaining analytic
Optimization of the linear-scaling local natural orbital CCSD(T) method: Redundancy-free triples correction using Laplace transform.
An improved algorithm is presented for the evaluation of the (T) correction as a part of the local natural orbital (LNO) coupled-cluster singles and doubles with perturbative triples scheme and enables the computation of LNO-CCSD(T) correlation energies with at least triple-zeta quality basis sets for realistic three-dimensional molecules.
Approaching the basis set limit of CCSD(T) energies for large molecules with local natural orbital coupled-cluster methods.
It is demonstrated that the complete basis set limit (CBS) of LNO-CCSD(T) energies can be reliably approached via basis set extrapolation using large basis sets including diffuse functions.
GPU‐Accelerated Large‐Scale Excited‐State Simulation Based on Divide‐and‐Conquer Time‐Dependent Density‐Functional Tight‐Binding
Numerical applications confirmed that the present code on GPU significantly accelerated the TDDFTB calculations, while maintaining accuracy, and the DC‐TDDFTB simulation of 2‐acetylindan‐1,3‐dione displays excited‐state intramolecular proton transfer and provides reasonable absorption and fluorescence energies with the corresponding experimental values.
Single-precision open-shell CCSD and CCSD(T) calculations on graphics processing units.
It has been shown that coupled-cluster calculations with single-precision data are able to provide correlation energy with insignificant loss of accuracy. In this work, we employed consumer GPUs to


Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.
This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems, and finds the GPU outperforms the Phi for both square and nonsquare matrix multiplications.
Generating Efficient Quantum Chemistry Codes for Novel Architectures.
It is suggested that the meta-programming and empirical performance optimization approach may be important in future computational chemistry applications, especially in the face of quickly evolving computer architectures.
GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems.
It is demonstrated that a simple regularization of the cluster amplitudes used in the noniterative corrections accounting for the effect of triply excited configurations significantly improves the accuracies of ground-state energies in the presence of strong quasidegeneracy effects.
MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller-Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers.
In this algorithm, a Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) hybrid parallel programming model is applied to attain efficient parallel performance on massively parallel supercomputers.
Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.
In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore
Density-fitted singles and doubles coupled cluster on graphics processing units
We adapt an algorithm for singles and doubles coupled cluster (CCSD) that uses density fitting or Cholesky decomposition (CD) in the construction and contraction of all electron repulsion integrals
Parallel Programming with OpenACC
Parallel Programming with OpenACC explains how anyone can use OpenACC to quickly ramp-up application performance using high-level code directives called pragmas, and presents the simplest way to leverage GPUs to achieve application speedups.