Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromo- dynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device… (More)
Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD… (More)
Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia's CUDA platform, in particular, offers direct access to graphics hardware through a programming… (More)
The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency at remote observatory sites parallels that in HPC broadly, where efficiency is a critical metric. We investigate how the performance-per-watt of… (More)
We discuss the calculation of disconnected diagrams needed for determining the strange quark content of the nucleon on the lattice. We present results for the strange scalar form factor and the related parameter f T s , which enters into the cross-section for the scattering of dark matter off nuclei in supersymmetric extensions of the standard model. In… (More)
The Möbius domain wall action  is a generalization of Shamir's action, which gives exactly the same overlap fermion lattice action as the separation (L s) between the domain walls is taken to infinity. The performance advantages of the algorithm are presented for small ensembles of quenched, full QCD domain wall and Gap domain wall lattices . In… (More)
We discuss methods for the calculation of disconnected diagrams and their application to various form factors of the nucleon. In particular, we present preliminary results for the strange contribution to the scalar and axial form factors, calculated with N f = 2 dynamical flavors of Wilson fermions on an anisotropic lattice.
Using the CUDA platform we have implemented a mixed precision Krylov solver for the Wilson-Dirac matrix for lattice QCD. The matrix-vector product which accounts for the vast majority of the operations runs in excess of 130 Gflops in single precision on the GTX 280. We have developed a new approach for mixed-precision Krylov solvers that achieves in excess… (More)