Compile-time GPU memory access optimizations

@article{Braak2010CompiletimeGM,
  title={Compile-time GPU memory access optimizations},
  author={Gert-Jan van den Braak and Bart Mesman and Henk Corporaal},
  journal={2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation},
  year={2010},
  pages={200-207}
}
In the last three years, GPUs are more and more being used for general purpose applications instead of only for computer graphics. Programming these GPUs is a big challenge; in current GPUs the main bottleneck for many applications is not the computing power, but the memory access bandwidth. Two compile-time optimizations are presented in this paper to deal with the two most important memory access issues. To describe these optimizations, a new notation of the parallel execution of GPU programs… Expand
Optimized two-level parallelization for GPU accelerators using the polyhedral model
TLDR
This paper proposes a novel compiler optimization algorithm for GPU parallelism based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. Expand
Improving GPU Performance: Reducing Memory Conflicts and Latency
TLDR
A set of software techniques to improve the parallel updating of the output bins in the voting algorithms, the so called ‘voting algorithms’ such as histogram and Hough transform, are analyzed, implemented and optimized on GPUs. Expand
Pragma Directed Shared Memory Centric Optimizations on GPUs
  • Jing Li, Lei Liu, +4 authors Chengyong Wu
  • Computer Science
  • Journal of Computer Science and Technology
  • 2016
TLDR
A data centric way to shared memory optimization on GPU is proposed, a pragma extension on OpenACC is designed so as to convey data management hints of programmers to compiler, and a compiler framework is devised to automatically select optimal parameters for shared arrays, using the polyhedral model. Expand
Efficiency of GPUs for Relational Database Engine Processing
TLDR
This paper proposes to boost an existing RDBMS by making it able to use hardware architectures with high memory bandwidth like GPUs, and presents a solution named CuDB, which focuses on technical specificities of GPUs which are most relevant for designing high energy efficient solutions for database processing. Expand
RT-CUDA: A Software Tool for CUDA Code Restructuring
TLDR
A restructuring tool (RT-CUDA) that takes a C-like program and some user directives as compiler hints to produce an optimized CUDA code to help scientists developing parallel simulators like reservoir simulators, molecular dynamics, etc without exposing to complexity of GPU and CUDA programming. Expand
A novel graphics processor architecture based on partial stream rewriting
TLDR
This work model the complete rendering pipeline as a functional program, which is then represented as a stream of tokens and iteratively modified by a set of rewriting rules, which enables dynamic thread creation, lock-free synchronization and light-weight scheduling based on pattern matching. Expand
An Optimization Compiler Framework Based on Polyhedron Model for GPGPUs
TLDR
An Optimization Compiler Framework Based on Polyhedron Model for GPGPUs is described to bridge the speed gap between the GPU cores and the off-chip memory and improve the overall performance of the GPU systems. Expand
An efficient GPU acceleration format for finite element analysis
TLDR
Numerical results show that the proposed GPU-accelerated storage format, MCTO-applied FEM, is about 10 times faster than conventional FEM on a CPU, and faster than other row-major ordering formats on a GPU. Expand
The Storage Formats for Accelerating SMVP on a GPU
TLDR
The research in this paper can provide fast selects, which allow low storage space and make memory accesses efficiency, for numerical methods to accelerate SMVP. Expand
Improving Performances of an Embedded Relational Database Management System with a Hybrid CPU/GPU Processing Engine
TLDR
This paper proposes to upgrade SQLite, the most-spreaded embedded RDBMS, with a hybrid CPU/GPU processing engine combined with appropriate data management, and named CuDB, massively parallel processing is combined with strategic data placement, closer to computing units. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 15 REFERENCES
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
TLDR
A simple analytical model is proposed that estimates the execution time of massively parallel programs by considering the number of running threads and memory bandwidth and estimates the cost of memory requests, thereby estimating the overall executionTime of a program. Expand
CUDA-Lite: Reducing GPU Programming Complexity
TLDR
The present CUDA-lite, an enhancement to CUDA, is presented and preliminary results that indicate auto-generated code can have performance comparable to hand coding are shown. Expand
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
TLDR
This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications, and identifies several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Expand
Memory access coalescing: a technique for eliminating redundant memory accesses
TLDR
A general code improvement algorithm that transforms code to better exploit the available memory bandwidth on existing microprocessors as well as wide-bus machines, and the effectiveness of the transformation varied significantly with respect to the instruction-set architecture of the tested platform. Expand
NVIDIA Tesla: A Unified Graphics and Computing Architecture
TLDR
To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture, which is massively multithreaded and programmable in C or via graphics APIs. Expand
High performance compilers for parallel computing
TLDR
This book discusses Programming Language Features, Data Dependence, Dependence System Solvers, and Run-time Dependence Testing for High Performance Systems. Expand
Roofline: an insightful visual performance model for multicore architectures
TLDR
The Roofline model offers insight on how to improve the performance of software and hardware in the rapidly changing world of connected devices. Expand
OpenCL Programming Guide for the CUDA Architecture, NVIDIA Corporation
  • OpenCL Programming Guide for the CUDA Architecture, NVIDIA Corporation
  • 2009
Optimizing Matrix Transpose in CUDA, NVIDIA Corporation
  • Optimizing Matrix Transpose in CUDA, NVIDIA Corporation
  • 2009
Programming Guide -Version 2.2, NVIDIA Corporation
  • Programming Guide -Version 2.2, NVIDIA Corporation
  • 2009
...
1
2
...