Skeleton-based automatic parallelization of image processing algorithms for GPUs
@article{Nugteren2011SkeletonbasedAP, title={Skeleton-based automatic parallelization of image processing algorithms for GPUs}, author={Cedric Nugteren and Henk Corporaal and Bart Mesman}, journal={2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation}, year={2011}, pages={25-32} }
Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image…
Figures and Tables from this paper
31 Citations
Rapid prototyping of image processing workflows on massively parallel architectures
- Computer ScienceProceedings of the 10th International Workshop on Intelligent Solutions in Embedded Systems
- 2012
A graphical environment for the design of image processing workflows that automatically generates all the CUDA code including NPP calls necessary to run the application on a GPU, which is almost as efficient as the equivalent hand written program and 10 times faster than running on the CPU alone in the typical case.
Optimizing convolution operations on GPUs using adaptive tiling
- Computer ScienceFuture Gener. Comput. Syst.
- 2014
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
- Computer ScienceGPGPU-5
- 2012
A new classification of algorithms is used in a new source-to-source compiler, which is based on the algorithmic skeletons technique, and it is demonstrated that the presented compiler requires little modifications to the original sequential source code, generates readable code for further fine-tuning, and delivers superior performance compared to other tools for a set of 8 image processing kernels.
Parallel Implementation of Color Based Image Retrieval Using CUDA on the GPU
- Computer Science
- 2014
This research work uses extensive usage of highly multithreaded architecture of multi-cored GPU to parallelize the process of color based image retrieval through color moments and whole process is much faster than normal.
Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators. (Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU)
- Computer Science
- 2012
A compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability is proposed and a prototype, Par4All, is implemented and validated with numerous experiences.
Parallel Implementation of Texture Based Image Retrieval on The GPU
- Computer Science
- 2013
The main goal of this research work is to parallelize the process of texture based image retrieval through entropy, standard deviation, and local range, also whole process is much faster than normal.
A novel graphics processor architecture based on partial stream rewriting
- Computer Science2013 Conference on Design and Architectures for Signal and Image Processing
- 2013
This work model the complete rendering pipeline as a functional program, which is then represented as a stream of tokens and iteratively modified by a set of rewriting rules, which enables dynamic thread creation, lock-free synchronization and light-weight scheduling based on pattern matching.
Accelerating Image Algorithm Development using Soft Co-Processors on FPGAs
- Computer Science2018 29th Irish Signals and Systems Conference (ISSC)
- 2018
A system model based on a set of Soft Co-Processors, each of which implements a basic image-level operation based on the high-level operators in Image Algebra, enabling algorithm development to take place on the FPGA itself.
Saliency Detection on FPGA Using Accelerators and Evaluation of Algorithmic Skeletons
- Computer Science
- 2012
The high level synthesis tool shows to be promising to use for skeletons and a speed-up of 3 times is achieved compared to an Intel Core i5 running at 2.53 GHz.
Evaluating the Performance and Portability of OpenCL
- Computer Science
- 2011
To what extent OpenCL is a suitable substitute for current programming standards is the main topic of interest in this thesis, and a detailed comparison and analysis of the performances of several image-processing algorithms implemented in both CUDA and OpenCL, and mapped onto an NVIDIA GPU.
References
SHOWING 1-10 OF 22 REFERENCES
Algorithmic skeletons for stream programming in embedded heterogeneous parallel image processing applications
- Computer ScienceProceedings 20th IEEE International Parallel & Distributed Processing Symposium
- 2006
This paper presents a C-like skeleton implementation language, PEPCI, that uses term rewriting and partial evaluation to specify skeletons for parallel C dialects, and provides a stream programming language that is better tailored to the user as well as the underlying architecture.
SkePU: a multi-backend skeleton programming library for multi-GPU systems
- Computer ScienceHLPP '10
- 2010
The results show that a skeleton approach to GPU programming is viable, especially when the computation burden is large compared to memory I/O (the lazy memory copying can help to achieve this), and shows that utilizing several GPUs have a potential for performance gains.
Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries
- Computer Science
- 2012
An extension to the CUDA tool-chain is described, providing programmers with a visualization of register life ranges, and guidelines describing how to apply optimizations in order to obtain a lower register pressure are presented.
GPU Kernels as Data-Parallel Array Computations in Haskell
- Computer Science
- 2009
We present a novel high-level parallel programming model for graphics processing units (GPUs). We embed GPU kernels as data-parallel array computations in the purely functional language Haskell. GPU…
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
- Computer ScienceAPLAS
- 2009
A skeletal parallel programming framework that enables programmers to easily write GPGPU applications and rapidly test them and provides an optimization mechanism based on fusion transformation that was confirmed experimentally.
hiCUDA: a high-level directive-based language for GPU programming
- Computer ScienceGPGPU-2
- 2009
The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate…
GpuCV: an opensource GPU-accelerated framework forimage processing and computer vision
- Computer ScienceACM Multimedia
- 2008
The GpuCV framework transparently manages hardware capabilities, data synchronization, activation of low level GLSL and CUDA programs, on-the-fly benchmarking and switching to the most efficient implementation and finally offers a set of image processing operators with GPU acceleration available.
Towards a general framework for FPGA based image processing using hardware skeletons
- Computer ScienceParallel Comput.
- 2002
CUDA-Lite: Reducing GPU Programming Complexity
- Computer ScienceLCPC
- 2008
The present CUDA-lite, an enhancement to CUDA, is presented and preliminary results that indicate auto-generated code can have performance comparable to hand coding are shown.