# Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

@article{Eriksen2016EfficientAP, title={Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives}, author={Janus Juul Eriksen}, journal={Molecular Physics}, year={2016}, volume={115}, pages={2086 - 2101} }

ABSTRACT It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order Møller-Plesset (MP2) model in its resolution-of-the-identity approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the…

## 13 Citations

Integral-direct and parallel implementation of the CCSD(T) method: algorithmic developments and large-scale applications.

- ChemistryJournal of chemical theory and computation
- 2019

The efficiency of this implementation allowed us to perform some of the largest CCSD(T) calculations ever presented for systems of 31-43 atoms and 1037-1569 orbitals using only 4-8 many-core CPUs and 1-3 days of wall time.

Accurate Reduced-Cost CCSD(T) Energies: Parallel Implementation, Benchmarks, and Large-Scale Applications

- ChemistryJournal of chemical theory and computation
- 2021

The accurate and systematically improvable frozen natural orbital (FNO) and natural auxiliary function (NAF) cost-reducing approaches are combined with recent coupled-cluster singles, doubles, and perturbative triples implementations to create the practically “gold standard” quality FNO-CCSD(T) method.

GPU acceleration of rank-reduced coupled-cluster singles and doubles.

- Chemistry, PhysicsThe Journal of chemical physics
- 2021

A graphical processing unit (GPU) accelerated implementation of the recently introduced rank-reduced coupled-cluster singles and doubles method, which introduces a low-rank approximation of the doubles amplitudes, and test the accuracy of RR-CCSD for a variety of chemical systems, finding that accuracy to better than 0.1% error in the correlation energy can be achieved.

Many-Body Quantum Chemistry on Massively Parallel Computers.

- Computer Science, ChemistryChemical reviews
- 2020

The deployment of many-body quantum chemistry methods onto massively parallel high-performance computing (HPC) platforms is reviewed. The particular focus is on highly accurate methods that have…

Performance of Coupled-Cluster Singles and Doubles on Modern Stream Processing Architectures.

- Computer ScienceJournal of chemical theory and computation
- 2020

We develop a new implementation of coupled-cluster singles and doubles (CCSD) optimized for the most recent graphical processing unit (GPU) hardware. We find that a single node with 8 NVIDIA V100…

Energy correction and analytic energy gradients due to triples in CCSD(T) with spin–orbit coupling on graphic processing units using single-precision data

- Physics, ChemistryMolecular Physics
- 2021

ABSTRACT Calculating the contribution of triples ((T)) to the correlation energy, the density matrices and the constant terms in the Λ equation is the most expensive steps in obtaining analytic…

Optimization of the linear-scaling local natural orbital CCSD(T) method: Redundancy-free triples correction using Laplace transform.

- ChemistryThe Journal of chemical physics
- 2017

An improved algorithm is presented for the evaluation of the (T) correction as a part of the local natural orbital (LNO) coupled-cluster singles and doubles with perturbative triples scheme and enables the computation of LNO-CCSD(T) correlation energies with at least triple-zeta quality basis sets for realistic three-dimensional molecules.

Approaching the basis set limit of CCSD(T) energies for large molecules with local natural orbital coupled-cluster methods.

- Chemistry, PhysicsJournal of chemical theory and computation
- 2019

It is demonstrated that the complete basis set limit (CBS) of LNO-CCSD(T) energies can be reliably approached via basis set extrapolation using large basis sets including diffuse functions.

GPU‐Accelerated Large‐Scale Excited‐State Simulation Based on Divide‐and‐Conquer Time‐Dependent Density‐Functional Tight‐Binding

- Computer ScienceJ. Comput. Chem.
- 2019

Numerical applications confirmed that the present code on GPU significantly accelerated the TDDFTB calculations, while maintaining accuracy, and the DC‐TDDFTB simulation of 2‐acetylindan‐1,3‐dione displays excited‐state intramolecular proton transfer and provides reasonable absorption and fluorescence energies with the corresponding experimental values.

Single-precision open-shell CCSD and CCSD(T) calculations on graphics processing units.

- Physics, Computer SciencePhysical chemistry chemical physics : PCCP
- 2020

It has been shown that coupled-cluster calculations with single-precision data are able to provide correlation energy with insignificant loss of accuracy. In this work, we employed consumer GPUs to…

## References

SHOWING 1-10 OF 181 REFERENCES

Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

- PhysicsJournal of chemical theory and computation
- 2014

This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems, and finds the GPU outperforms the Phi for both square and nonsquare matrix multiplications.

Generating Efficient Quantum Chemistry Codes for Novel Architectures.

- Computer ScienceJournal of chemical theory and computation
- 2013

It is suggested that the meta-programming and empirical performance optimization approach may be important in future computational chemistry applications, especially in the face of quickly evolving computer architectures.

GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems.

- ChemistryJournal of chemical theory and computation
- 2011

It is demonstrated that a simple regularization of the cluster amplitudes used in the noniterative corrections accounting for the effect of triply excited configurations significantly improves the accuracies of ground-state energies in the presence of strong quasidegeneracy effects.

MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller-Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers.

- Computer ScienceJournal of chemical theory and computation
- 2013

In this algorithm, a Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) hybrid parallel programming model is applied to attain efficient parallel performance on massively parallel supercomputers.

Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

- Computer ScienceJournal of chemical theory and computation
- 2012

In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore…

Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

- Computer ScienceComput. Phys. Commun.
- 2017

Density-fitted singles and doubles coupled cluster on graphics processing units

- Computer Science
- 2014

We adapt an algorithm for singles and doubles coupled cluster (CCSD) that uses density fitting or Cholesky decomposition (CD) in the construction and contraction of all electron repulsion integrals…

Parallel Programming with OpenACC

- Computer Science
- 2016

Parallel Programming with OpenACC explains how anyone can use OpenACC to quickly ramp-up application performance using high-level code directives called pragmas, and presents the simplest way to leverage GPUs to achieve application speedups.

Towards dense linear algebra for hybrid GPU accelerated manycore systems

- Computer ScienceParallel Comput.
- 2010

Coupled cluster algorithms for networks of shared memory parallel processors

- Computer ScienceComput. Phys. Commun.
- 2007