Automatic C-to-CUDA Code Generation for Affine Programs

Abstract

Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded parallel programming model, facilitating high performance implementations of general-purpose computations. However, the explicitly managed memory hierarchy and multi-level parallel view make manual development of high-performance CUDA code rather complicated. Hence the automatic transformation of sequential input programs into efficient parallel CUDA programs is of considerable interest. This paper describes an automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs. Using and adapting publicly available tools that have made polyhedral compiler optimization practically effective, we develop a C-to-CUDA transformation system that generates two-level parallel CUDA code that is optimized for efficient data access. The performance of automatically generated code is compared with manually optimized CUDA code for a number of benchmarks. The performance of the automatically generated CUDA code is quite close to hand-optimized CUDA code and considerably better than the benchmarks’ performance on a multicore CPU.

DOI: 10.1007/978-3-642-11970-5_14

Extracted Key Phrases

12 Figures and Tables

0204020102011201220132014201520162017
Citations per Year

201 Citations

Semantic Scholar estimates that this publication has 201 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Baskaran2010AutomaticCC, title={Automatic C-to-CUDA Code Generation for Affine Programs}, author={Muthu Manikandan Baskaran and J. Ramanujam and P. Sadayappan}, booktitle={CC}, year={2010} }