Accelerating QDP++ using GPUs

  title={Accelerating QDP++ using GPUs},
  author={Frank Winter},
  • F. Winter
  • Published 11 May 2011
  • Computer Science
  • ArXiv
Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domain. QDP++ is a C++ vector class library suited for quantum field theory which provides vector data types and… 

Figures and Tables from this paper

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

This paper presents a general approach that enables automatic offloading of C++ expression templates to CUDA enabled GPUs by using the C++ metaprogramming technique and Just-In-Time compilation methodology to generate and compile CUDA kernels for corresponding expression templates followed by executing the kernels with appropriate arguments.

Porting Production Level Quantum Chromodynamics Code to Graphics Processing Units - A Case Study

A project to port an existing large lattice QCD codebase to run on GPUs and clusters of GPUs and the resulting simulator reproduces the original results while running up to 11 times faster than the highly optimized CPU-code and meeting productivity requirements.

Generating pure gauge lattice QCD configurations on GPUs with CUDA

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluody- namics in external magnetic field at finite temperature and O(N), which is designed to produce lattice gauge configurations as well as to analyze previously generated ones.

Computational Physics on Graphics Processing Units

Advances made in the field of computational physics are discussed, focusing on classical molecular dynamics and quantum simulations for electronic structure calculations using the density functional theory, wave function techniques and quantum field theory.

Phase structure of five-dimensional anisotropic lattice gauge theories

The idea that we live in a higher-dimensional space was first introduced almost 100 years ago. In the past two decades many extra-dimensional models have been proposed in order to solve fundamental

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

  • R. BabichM. ClarkB. Joó
  • Computer Science, Physics
    2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2010
This contribution describes the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation.

QCD on GPUs: cost effective supercomputing

This work reviews the progress to date in using GPUs for large scale calculations, and contrast GPUs against more traditional HPC architectures.

Finite temperature lattice QCD with GPUs

A performance comparison between the GPU and CPU with single precision and double precision in generating lattice SU(2) configurations for renormalized Polyakov loop and the string tension as a function of the temperature is presented.

Multi-mass solvers for lattice QCD on GPUs

GPU-Based Conjugate Gradient Solver for Lattice QCD with Domain-Wall Fermions

This work has designed a CG solver for the general 5-dimensional DWF operator on NVIDIA CUDA architecture with mixed-precision, using the defect correction as well as the reliable updates algorithms.

Implementation of the Neuberger-Dirac operator on GPUs

Recent developments have shown that a lot can be gained for QCD simulations from GPU hardware. This can be exploited especially in the case of Ginsparg-Wilson fermions when the com putational costs

Investigation of hadron matter using lattice QCD and implementation of lattice QCD applications on heterogeneous multicore acceleration processors

In this work new design concepts were developed for an active library (QDP++) harnessing the compute power of a heterogeneous multicore processor (IBM PowerXCell 8i processor) and it was possible to run a QDP++ based physics application (Chroma) achieving a reasonable performance on the IBM BladeCenter QS22.

Expression templates

In preliminary benchmark results, one compiler evaluates vector expressions at 95-99.5% efficiency of handcoded C using this technique (for long vectors), which is 2-15 times that of a conventional C++ vector class.

The Chroma Software System for Lattice QCD