High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library

Abstract

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high-level programming abstractions with competitive performance on multi-GPU systems. SkelCL extends the OpenCL standard by three high-level features: 1) pre-implemented parallel patterns (a.k.a. skeletons); 2) container data types for vectors and matrices; 3) automatic data (re)distribution mechanism. We introduce two new SkelCL skeletons which specifically target stencil computations – MapOverlap and Stencil – and we describe their use for particular application examples, discuss their efficient parallel implementation, and report experimental results on systems with multiple GPUs. Our evaluation of three real-world applications shows that stencil code written with SkelCL is considerably shorter and offers competitive performance to hand-tuned OpenCL code. Electronic version of an article published as Parallel Processing Letters, Volume 24, Issue 03, September 2014, 17 pages. DOI: 10.1142/S0129626414410059 c ©World Scientific Publishing Company, Journal URL: http://www.worldscientific.com/worldscinet/ppl

DOI: 10.1142/S0129626414410059

Extracted Key Phrases

9 Figures and Tables

Cite this paper

@article{Steuwer2014HighLevelPO, title={High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library}, author={Michel Steuwer and Michael Haidl and Stefan Breuer and Sergei Gorlatch}, journal={Parallel Processing Letters}, year={2014}, volume={24} }