Skeleton-based automatic parallelization of image processing algorithms for GPUs

@article{Nugteren2011SkeletonbasedAP,
  title={Skeleton-based automatic parallelization of image processing algorithms for GPUs},
  author={Cedric Nugteren and Henk Corporaal and Bart Mesman},
  journal={2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation},
  year={2011},
  pages={25-32}
}
  • C. Nugteren, H. Corporaal, B. Mesman
  • Published 18 July 2011
  • Computer Science
  • 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation
Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image… 

Figures and Tables from this paper

Rapid prototyping of image processing workflows on massively parallel architectures
TLDR
A graphical environment for the design of image processing workflows that automatically generates all the CUDA code including NPP calls necessary to run the application on a GPU, which is almost as efficient as the equivalent hand written program and 10 times faster than running on the CPU alone in the typical case.
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
TLDR
A new classification of algorithms is used in a new source-to-source compiler, which is based on the algorithmic skeletons technique, and it is demonstrated that the presented compiler requires little modifications to the original sequential source code, generates readable code for further fine-tuning, and delivers superior performance compared to other tools for a set of 8 image processing kernels.
Parallel Implementation of Color Based Image Retrieval Using CUDA on the GPU
TLDR
This research work uses extensive usage of highly multithreaded architecture of multi-cored GPU to parallelize the process of color based image retrieval through color moments and whole process is much faster than normal.
Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators. (Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU)
TLDR
A compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability is proposed and a prototype, Par4All, is implemented and validated with numerous experiences.
Parallel Implementation of Texture Based Image Retrieval on The GPU
TLDR
The main goal of this research work is to parallelize the process of texture based image retrieval through entropy, standard deviation, and local range, also whole process is much faster than normal.
A novel graphics processor architecture based on partial stream rewriting
TLDR
This work model the complete rendering pipeline as a functional program, which is then represented as a stream of tokens and iteratively modified by a set of rewriting rules, which enables dynamic thread creation, lock-free synchronization and light-weight scheduling based on pattern matching.
Accelerating Image Algorithm Development using Soft Co-Processors on FPGAs
TLDR
A system model based on a set of Soft Co-Processors, each of which implements a basic image-level operation based on the high-level operators in Image Algebra, enabling algorithm development to take place on the FPGA itself.
Saliency Detection on FPGA Using Accelerators and Evaluation of Algorithmic Skeletons
TLDR
The high level synthesis tool shows to be promising to use for skeletons and a speed-up of 3 times is achieved compared to an Intel Core i5 running at 2.53 GHz.
Evaluating the Performance and Portability of OpenCL
TLDR
To what extent OpenCL is a suitable substitute for current programming standards is the main topic of interest in this thesis, and a detailed comparison and analysis of the performances of several image-processing algorithms implemented in both CUDA and OpenCL, and mapped onto an NVIDIA GPU.
...
1
2
3
4
...

References

SHOWING 1-10 OF 22 REFERENCES
Algorithmic skeletons for stream programming in embedded heterogeneous parallel image processing applications
TLDR
This paper presents a C-like skeleton implementation language, PEPCI, that uses term rewriting and partial evaluation to specify skeletons for parallel C dialects, and provides a stream programming language that is better tailored to the user as well as the underlying architecture.
SkePU: a multi-backend skeleton programming library for multi-GPU systems
TLDR
The results show that a skeleton approach to GPU programming is viable, especially when the computation burden is large compared to memory I/O (the lazy memory copying can help to achieve this), and shows that utilizing several GPUs have a potential for performance gains.
Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries
TLDR
An extension to the CUDA tool-chain is described, providing programmers with a visualization of register life ranges, and guidelines describing how to apply optimizations in order to obtain a lower register pressure are presented.
GPU Kernels as Data-Parallel Array Computations in Haskell
We present a novel high-level parallel programming model for graphics processing units (GPUs). We embed GPU kernels as data-parallel array computations in the purely functional language Haskell. GPU
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
TLDR
A skeletal parallel programming framework that enables programmers to easily write GPGPU applications and rapidly test them and provides an optimization mechanism based on fusion transformation that was confirmed experimentally.
hiCUDA: a high-level directive-based language for GPU programming
The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate
GpuCV: an opensource GPU-accelerated framework forimage processing and computer vision
TLDR
The GpuCV framework transparently manages hardware capabilities, data synchronization, activation of low level GLSL and CUDA programs, on-the-fly benchmarking and switching to the most efficient implementation and finally offers a set of image processing operators with GPU acceleration available.
Towards a general framework for FPGA based image processing using hardware skeletons
CUDA-Lite: Reducing GPU Programming Complexity
TLDR
The present CUDA-lite, an enhancement to CUDA, is presented and preliminary results that indicate auto-generated code can have performance comparable to hand coding are shown.
...
1
2
3
...