Implementing the Himeno benchmark with CUDA on GPU clusters


This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA… (More)
DOI: 10.1109/IPDPS.2010.5470394


13 Figures and Tables


Citations per Year

61 Citations

Semantic Scholar estimates that this publication has 61 citations based on the available data.

See our FAQ for additional information.