Architecture-Aware Mapping and Optimization on a 1600-Core GPU

  title={Architecture-Aware Mapping and Optimization on a 1600-Core GPU},
  author={Mayank Daga and Thomas Scogland and Wu-chun Feng},
  journal={2011 IEEE 17th International Conference on Parallel and Distributed Systems},
The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on an AMD GPU… CONTINUE READING
Highly Cited
This paper has 31 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 19 extracted citations


Publications referenced by this paper.
Showing 1-10 of 21 references

Optimizing Matrix Transpose in CUDA

  • 2009.
  • 2009
Highly Influential
4 Excerpts

An experimental study on performance portability of opencl kernels

  • S. Rul, H. Vandierendonck, J. DHaene, K. D. Bosschere
  • Symp. on Application Accelerators in High…
  • 2010
1 Excerpt

NVIDIA CUDA Programming Guide-3.2

  • 2010.
  • 2010
1 Excerpt

Similar Papers

Loading similar papers…