GPU-Vote: A Framework for Accelerating Voting Algorithms on GPU

  title={GPU-Vote: A Framework for Accelerating Voting Algorithms on GPU},
  author={Gert-Jan van den Braak and C. Nugteren and B. Mesman and H. Corporaal},
Voting algorithms, such as histogram and Hough transforms, are frequently used algorithms in various domains, such as statistics and image processing. Algorithms in these domains may be accelerated using GPUs. Implementing voting algorithms efficiently on a GPU however is far from trivial due to irregularities and unpredictable memory accesses. Existing GPU implementations therefore target only specific voting algorithms while we propose in this work a methodology which targets voting… Expand
Improving GPU Performance: Reducing Memory Conflicts and Latency
A set of software techniques to improve the parallel updating of the output bins in the voting algorithms, the so called ‘voting algorithms’ such as histogram and Hough transform, are analyzed, implemented and optimized on GPUs. Expand
Simulation and architecture improvements of atomic operations on GPU scratchpad memory
This paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory in GPGPU-Sim to reduce serialization of threads and result in a speed-up in histogram and Hough transform applications with minimum hardware costs. Expand
In-place transposition of rectangular matrices on accelerators
This paper presents the first known in-place matrix transposition approach for the GPUs based on a novel 3-stage transposition algorithm where each stage is performed using an elementary tiled-wise transposition, achieving 3X speedup over a traditional 4-stage algorithm. Expand
Massive atomics for massive parallelism on GPUs
A novel approach to deal with shared data object management for reduction type parallelism on GPUs that exploits fine-grained parallelism while at the same time maintaining good programmability is proposed. Expand
Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs
This paper explores the use of configurable bit-vector and bitwise XOR-based hash functions to evenly distribute memory addresses of the access patterns over the memory banks, reducing the number of bank conflicts. Expand
Accelerating Sequential Computer Vision Algorithms Using Commodity Parallel Hardware
The last decade has seen an increasing demand from the industrial field of computerized visual inspection. Applications rapidly become more complex and often with more demanding real timeExpand
Parallel implementation of a real-time high dynamic range video system
The use of the parallel processing capabilities of a graphics chip to increase the processing speed of a high dynamic range HDR video system is described and modifications to the algorithms that are necessary to enable parallel processing are described. Expand
Improving the Programmability of GPU Architectures
The final author version and the galley proof are versions of the publication after peer review and the final published version features the final layout of the paper including the volume, issue and page numbers. Expand
Parallel implementation of the multi-view image segmentation algorithm using the Hough transform
A parallel implementation of a multi-view image segmentation algorithm via segmenting the corresponding three-dimensional scene through the use of the Hough space is reported on. Expand
Cross-Layer Energy Efficient Resource Allocation in PD-NOMA Based H-CRANs: Implementation via GPU
In this paper, a cross layer energy efficient resource allocation and remote radio head (RRH) selection algorithm for heterogeneous traffic in power domain—non-orthogonal multiple access (PD-NOMA based heterogeneous cloud radio access networks (H-CRANs) is proposed and the system energy efficiency is improved. Expand


Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices
Two efficient histogram algorithms designed for NVIDIA’s compute unified device architecture (CUDA) compatible graphics processor units (GPUs) are presented, showing that the speed of histogram calculations can be improved by up to 30 times compared to a CPU-based implementation. Expand
High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs
This paper presents two novel histogramming methods, both achieving a higher performance and predictability than existing methods and guarantees to be fully data independent. Expand
Fast Hough Transform on GPUs: Exploration of Algorithm Trade-Offs
The results show that optimizing the GPU code for speed can achieve a speed-up over naive GPU code of about 10×, and the implementation which achieves a constant processing time is quicker for about 20% of the images. Expand
On the computation of the Circle Hough Transform by a GPU rasterizer
This paper presents an alternative for a fast computation of the Hough transform by taking advantage of commodity graphics processors that provide a unique combination of low cost and highExpand
OpenVIDIA: parallel GPU computer vision
This paper proposes using GPUs in approximately the reverse way: to assist in "converting pictures into numbers" (i.e. computer vision) and provides a simple API which implements some common computer vision algorithms. Expand
NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing
For four years, NVIDIA has waged a campaign to redefine the role of GPUs, harnessing the massively parallel-processing resources originally designed for 3D graphics to apply GPUs to a much broader range of computing applications beyond graphics. Expand
Color model-based real-time learning for road following
A vision system capable of accurately segmenting unstructured, nonhomogeneous roads of arbitrary shape under various lighting conditions is proposed, and preliminary testing demonstrates the system's effectiveness on roads not handled by previous systems. Expand
A Method of Fast and Robust for Traffic Sign Recognition
A new method of rejecting non-signs is presented, which improved the recognition rate in the complex outdoor scenes, and the algorithm includes two stages: traffic sign detection and recognition. Expand
Multimodality image registration by maximization of mutual information
The results demonstrate that subvoxel accuracy with respect to the stereotactic reference solution can be achieved completely automatically and without any prior segmentation, feature extraction, or other preprocessing steps which makes this method very well suited for clinical applications. Expand
On optimal and data based histograms
SUMMARY In this paper the formula for the optimal histogram bin width is derived which asymptotically minimizes the integrated mean squared error. Monte Carlo methods are used to verify theExpand