• Corpus ID: 22782302

Comprehensive Optimization of Parametric Kernels for Graphics Processing Units

  title={Comprehensive Optimization of Parametric Kernels for Graphics Processing Units},
  author={Xiaohui Chen and Marc Moreno Maza and Jeeva Paudel and Ning Xie},
This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the hardware characteristics. The expected benefit is to improve robustness, portability and efficiency of the generated computer programs. We address these requirements by: (1) treating machine and program parameters as unknown symbols during code generation… 

Figures and Tables from this paper


Automatically Tuned Linear Algebra Software
An approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units using the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS).
Automatic C-to-CUDA Code Generation for Affine Programs
An automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs, that is quite close to hand-optimizedCUDA code and considerably better than the benchmarks' performance on a multicore CPU.
Auto-tuning a high-level language targeted to GPU codes
This work performs auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and shows results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision.
Polyhedral parallel code generation for CUDA
A novel source-to-source compiler called PPCG is presented, which introduces a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency according to the constraints of modern GPUs.
Automated Code Engine for Graphical Processing Units: Application to the Effective Core Potential Integrals and Gradients.
It is observed that the best code candidate varies with differing angular momentum, floating point precision, and type of GPU being used, which shows that the ACE may be a powerful tool in adapting to fast evolving GPU architectures.
hiCUDA: a high-level directive-based language for GPU programming
The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate
A script-based autotuning compiler system to generate high-performance CUDA code
A Transformation Strategy Generator, a meta-optimizer that generates a set of transformation recipes, which are descriptions of the mapping of the sequential code to parallel CUDA code, which comprise a search space of possible implementations.
MetaFork: a compilation framework for concurrency models targeting hardware accelerators and its application to the generation of parametric CUDA kernels
This paper presents the accelerator model of MetaFork together with the software framework that allows automatic generation of CUDA code from annotatedMetaFork programs, and takes advantage of quantifier elimination and its implementation in the RegularChains in Maple.
Compiling a High-Level Directive-Based Programming Model for GPGPUs
This implementation of an open-source OpenACC compiler in a main stream compiler framework (OpenUH of a branch of Open64) is presented to serve as compiler infrastructure for researchers to explore advanced compiler techniques, to extend OpenACC to other programming languages, or to build performance tools used with OpenACC programs.
A PTX Code Generator for LLVM
This thesis develops an open source PTX code generator—PTX is assembly code for NVIDIA GPUs, based on the existing open source LLVM compiler, which achieves similar performance to the nvcc compiler.