Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Abstract

Several parallel architectures such as GPUs and the Cell processor have fast explicitly managed on-chip memories, in addition to slow off-chip memory. They also have very high computational power with multiple levels of parallelism. A significant challenge in programming these architectures is to effectively exploit the parallelism available in the architecture and manage the fast memories to maximize performance. In this paper we develop an approach to effective automatic data management for on-chip memories, including creation of buffers in on-chip (local) memories for holding portions of data accessed in a computational block, automatic determination of array access functions of local buffer references, and generation of code that moves data between slow off-chip memory and fast local memories. We also address the problem of mapping computation in regular programs to multi-level parallel architectures using a multi-level tiling approach, and study the impact of on-chip memory availability on the selection of tile sizes at various levels. Experimental results on a GPU demonstrate the effectiveness of the proposed approach.

DOI: 10.1145/1345206.1345210

Extracted Key Phrases

8 Figures and Tables

Statistics

0102020072008200920102011201220132014201520162017
Citations per Year

92 Citations

Semantic Scholar estimates that this publication has 92 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Baskaran2008AutomaticDM, title={Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories}, author={Muthu Manikandan Baskaran and Uday Bondhugula and Sriram Krishnamoorthy and J. Ramanujam and Atanas Rountev and P. Sadayappan}, booktitle={PPOPP}, year={2008} }