PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

@article{Klckner2012PyCUDAAP,
  title={PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation},
  author={Andreas Kl{\"o}ckner and Nicolas Pinto and Yunsup Lee and Bryan Catanzaro and Paul Ivanov and Ahmed Fasih},
  journal={Parallel Comput.},
  year={2012},
  volume={38},
  pages={157-174}
}
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to… Expand
Loo.py: transformation-based code generation for GPUs and CPUs
TLDR
Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model, providing a convenient way to capture, parametrize, and re-unify the growth among code variants. Expand
Transparent GPU Execution of NumPy Applications
In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from theExpand
GPU Array Access Auto-Tuning
TLDR
The MATOG auto-tuner that automatically optimizes array access for NVIDIA CUDA applications is introduced, able to achieve equal or even better performance than hand-optimized code and provide performance portability across different GPU types (low-, mid-, high-end and HPC) and generations. Expand
Multi-Stage Programming for GPUs in Modern C + + using PACXX
Writing and optimizing programs for high performance on systems with GPUs remains a challenging task even for expert programmers. One promising optimization technique is to evaluate parts of theExpand
Multi-stage programming for GPUs in C++ using PACXX
TLDR
This paper describes PACXX -- their approach to GPU programming in C++, with the convenient features of modern C++14 standard: type deduction, lambda expressions, and algorithms from the standard template library (STL). Expand
PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems
TLDR
PySchedCL is proposed, a framework which explores fine-grained concurrency aware scheduling decisions that harness the power of heterogeneous CPU/GPU architectures efficiently, a feature which is not provided by existing HPC frameworks. Expand
Contract-based general-purpose GPU programming
TLDR
A programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language, and demonstrates that runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU. Expand
The challenges of writing portable, correct and high performance libraries for GPUs
TLDR
This paper aims to deliver working, efficient GPU code in a library that is downloaded and run by many different users, and targets the linear solver module, including Conjugate Gradient, Jacobi and MinRes solvers for sparse matrices. Expand
Accelerating Haskell Array Codes with Algorithmic Skeletons on GPUs
TLDR
This thesis presents and quantitatively evaluate a GPU programming system that provides a high-level abstraction to facilitate the use of GPUs for general purpose array processing and allows programmers to focus on what to program on GPUs instead of how to program GPUs. Expand
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
TLDR
OMB-Py—Python extensions to the open-source OSU MicroBenchmark (OMB) suite—aimed to evaluate communication performance of MPI-based parallel applications in Python reveals that mpi4py introduces a small overhead when compared to native MPI libraries. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 68 REFERENCES
hiCUDA: a high-level directive-based language for GPU programming
The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separateExpand
Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform
The Cell BE processor is capable of achieving very high levels of performance via parallel computation. The processors in video accelerators, known as GPUs, are also high performance parallelExpand
Brook for GPUs: stream computing on graphics hardware
TLDR
This paper presents Brook for GPUs, a system for general-purpose computation on programmable graphics hardware that abstracts and virtualizes many aspects of graphics hardware, and presents an analysis of the effectiveness of the GPU as a compute engine compared to the CPU. Expand
Accelerator: using data parallelism to program GPUs for general-purpose uses
TLDR
This work describes Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead of C, and compares the performance of Accelerator versions of the benchmarks against hand-written pixel shaders. Expand
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
TLDR
This paper presents a programming interface called JCUDA that can be used by Java programmers to invoke CUDA kernels, and shows that this interface can deliver significant performance improvements to Java programmers. Expand
BSGP: bulk-synchronous GPU programming
TLDR
To test BSGP's code efficiency and ease of programming, a variety of GPU applications were implemented, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages. Expand
Implementing an embedded GPU language by combining translation and generation
TLDR
This paper describes how a domain specific language for image processing in Python can be compiled for execution on high speed graphics processing units and proposes a strategy which combine these two methods thereby achieving the benefits of both. Expand
Special Issue on Program Generation, Optimization, and Platform Adaptation
TLDR
This special issue of the PROCEEDINGS presents an overview of recent research on new methodologies for the design, development, and optimization of high-performance software libraries and applications and contains 13 invited papers grouped into four main areas. Expand
Optimizing Compilers for Modern Architectures: A Dependence-based Approach
TLDR
A broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling are provided. Expand
The Graphics Card as a Stream Computer
Massive data sets have radically changed our understanding of how to design efficient algorithms; the streaming paradigm, whether it in terms of number of passes of an external memory algorithm, orExpand
...
1
2
3
4
5
...