Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons

@inproceedings{Nugteren2012IntroducingA,
  title={Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons},
  author={Cedric Nugteren and Henk Corporaal},
  booktitle={GPGPU-5},
  year={2012}
}
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from programmers. A number of parallelizing source-to-source compilers have recently been developed to ease programming of multi-core and many-core processors. This work presents and evaluates a number of… 

Figures and Tables from this paper

The Bones Source-to-Source Compiler Manual
TLDR
This document is meant as a manual for users of Bones, which contains an overview of the tool itself and the skeletons, a mandatory read for users that plan on modifying or extending the skeletons and/or targets currently available in Bones.
Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification
TLDR
A tool is used to automatically annotate C code with species information where possible, which results in a unique approach, integrating a skeleton-based compiler for the first time into an automated flow.
Automatic Parallelization Tool : Classification of Program Code for Parallel Computing
TLDR
This work investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification.
Algorithmic Species : Classifying Program Code for Parallel Computing
TLDR
A new algorithm classification, ‘Algorithmic Species’, is introduced, which encapsulates relevant information for parallelization in classes, and embeds memory transfer requirements to optimize communication on heterogeneous platforms and design ASET, which is able to automatically identify 99% of the algorithmic species and to automatically extract memoryTransfer requirements.
Algorithmic species: A classification of affine loop nests for parallel programming
TLDR
This work introduces algorithmic species, a classification of affine loop nests based on the polyhedral model and targeted for both automatic and manual use that can help programmers to find opportunities for parallelization, reason about their code, and interact with other programmers.
On source-to-source compilers
Computer technologies like computer languages and hardware have been involving for past few decades. We have a lot of computer programs which need to maintain and rewrite when releasing new equipment
Classifying a program code for parallel computing against HPCC
  • Mustafa Basthikodi, W. Ahmed
  • Computer Science
    2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC)
  • 2016
TLDR
This work investigated current species for classification of algorithms and related work on classification is discussed along with the comparison of issues that challenges the classification, and executed new theories into the device, empowering automatic characterization of program code.
Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons
We propose a framework based on an original generation and use of algorithmic skeletons, and dedicated to speculative parallelization of scientific nested loop kernels, able to apply at run-time
SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems
TLDR
This article presents SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems, and proposes a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations.
Multi-GPU support on the marrow algorithmic skeleton framework
TLDR
Marrow was able to achieve a good balance between simplicity of the programming model and performance, obtaining good scalability when using multiple GPUs, with an efficient load distribution, although at the price of some overhead when using a singleGPU.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Source-to-Source Code Translator: OpenMP C to CUDA
TLDR
A source-to-source compiler able to automatically transform an OpenMP C code into a CUDA code, while maintaining a human readable version of the code that can be further analyzed or optimized is proposed.
Skeleton-based automatic parallelization of image processing algorithms for GPUs
TLDR
This paper presents a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations, and uses domain specific skeletons and a finer-grained classification of algorithms.
SkePU: a multi-backend skeleton programming library for multi-GPU systems
TLDR
The results show that a skeleton approach to GPU programming is viable, especially when the computation burden is large compared to memory I/O (the lazy memory copying can help to achieve this), and shows that utilizing several GPUs have a potential for performance gains.
Static Compilation Analysis for Host-Accelerator Communication Optimization
We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known
Algorithmic skeletons for stream programming in embedded heterogeneous parallel image processing applications
TLDR
This paper presents a C-like skeleton implementation language, PEPCI, that uses term rewriting and partial evaluation to specify skeletons for parallel C dialects, and provides a stream programming language that is better tailored to the user as well as the underlying architecture.
Automatic C-to-CUDA Code Generation for Affine Programs
TLDR
An automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs, that is quite close to hand-optimizedCUDA code and considerably better than the benchmarks' performance on a multicore CPU.
CUDA-Lite: Reducing GPU Programming Complexity
TLDR
The present CUDA-lite, an enhancement to CUDA, is presented and preliminary results that indicate auto-generated code can have performance comparable to hand coding are shown.
A modular and parameterisable classification of algorithms
TLDR
A new algorithm classification is introduced that uses a limited vocabulary and a well-defined grammar, creating a modular classification that is parameterisable and modularity and parameterisability make it possible to enable a very fine-grained and widely applicable classification.
SkelCL - A Portable Skeleton Library for High-Level GPU Programming
TLDR
This work proposes SkelCL -- a library providing so-called algorithmic skeletons that capture recurring patterns of parallel computation and communication, together with an abstract vector data type and constructs for specifying data distribution that greatly simplifies programming GPU systems.
hiCUDA: High-Level GPGPU Programming
TLDR
The hiCUDA}, a high-level directive-based language for CUDA programming is designed, which allows programmers to perform tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process.
...
1
2
3
...