# Accelerating QDP++ using GPUs

@article{Winter2011AcceleratingQU, title={Accelerating QDP++ using GPUs}, author={Frank Winter}, journal={ArXiv}, year={2011}, volume={abs/1105.2279} }

Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domain. QDP++ is a C++ vector class library suited for quantum field theory which provides vector data types and…

## 6 Citations

### Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

- Computer Science2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
- 2012

This paper presents a general approach that enables automatic offloading of C++ expression templates to CUDA enabled GPUs by using the C++ metaprogramming technique and Just-In-Time compilation methodology to generate and compile CUDA kernels for corresponding expression templates followed by executing the kernels with appropriate arguments.

### Porting Production Level Quantum Chromodynamics Code to Graphics Processing Units - A Case Study

- Computer SciencePARA
- 2012

A project to port an existing large lattice QCD codebase to run on GPUs and clusters of GPUs and the resulting simulator reproduces the original results while running up to 11 times faster than the highly optimized CPU-code and meeting productivity requirements.

### Generating pure gauge lattice QCD configurations on GPUs with CUDA

- Computer Science, PhysicsComput. Phys. Commun.
- 2013

### QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

- Computer Science
- 2013

The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluody- namics in external magnetic field at finite temperature and O(N), which is designed to produce lattice gauge configurations as well as to analyze previously generated ones.

### Computational Physics on Graphics Processing Units

- PhysicsPARA
- 2012

Advances made in the field of computational physics are discussed, focusing on classical molecular dynamics and quantum simulations for electronic structure calculations using the density functional theory, wave function techniques and quantum field theory.

### Phase structure of five-dimensional anisotropic lattice gauge theories

- Physics
- 2016

The idea that we live in a higher-dimensional space was first introduced almost 100 years ago. In the past two decades many extra-dimensional models have been proposed in order to solve fundamental…

## 17 References

### Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

- Computer Science, Physics2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
- 2010

This contribution describes the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation.

### QCD on GPUs: cost effective supercomputing

- Computer Science
- 2009

This work reviews the progress to date in using GPUs for large scale calculations, and contrast GPUs against more traditional HPC architectures.

### Solving lattice QCD systems of equations using mixed precision solvers on GPUs

- Computer Science, PhysicsComput. Phys. Commun.
- 2010

### Finite temperature lattice QCD with GPUs

- Computer Science, Physics
- 2011

A performance comparison between the GPU and CPU with single precision and double precision in generating lattice SU(2) configurations for renormalized Polyakov loop and the string tension as a function of the temperature is presented.

### GPU-Based Conjugate Gradient Solver for Lattice QCD with Domain-Wall Fermions

- Computer Science, Physics
- 2010

This work has designed a CG solver for the general 5-dimensional DWF operator on NVIDIA CUDA architecture with mixed-precision, using the defect correction as well as the reliable updates algorithms.

### Implementation of the Neuberger-Dirac operator on GPUs

- Physics
- 2010

Recent developments have shown that a lot can be gained for QCD simulations from GPU hardware. This can be exploited especially in the case of Ginsparg-Wilson fermions when the com putational costs…

### Investigation of hadron matter using lattice QCD and implementation of lattice QCD applications on heterogeneous multicore acceleration processors

- Physics, Computer Science
- 2012

In this work new design concepts were developed for an active library (QDP++) harnessing the compute power of a heterogeneous multicore processor (IBM PowerXCell 8i processor) and it was possible to run a QDP++ based physics application (Chroma) achieving a reasonable performance on the IBM BladeCenter QS22.

### Expression templates

- Computer Science
- 1996

In preliminary benchmark results, one compiler evaluates vector expressions at 95-99.5% efficiency of handcoded C using this technique (for long vectors), which is 2-15 times that of a conventional C++ vector class.