Thibaut Lutz

Learn More
GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity. In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best(More)
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library(More)
C++11 introduced a set of new features to extend the core language and the standard library. Amongst the new features are basic blocks for concurrency management like threads and atomic operation support, and a new syntax to declare single purpose, one off functions, called lambda functions, which integrate nicely to the Standard Template Library (STL). The(More)
State of the art automatic optimization of OpenCL applications focuses on improving the performance of individual compute kernels. Programmers address opportunities for inter-kernel optimization in specific applications by ad-hoc hand tuning: manually fusing kernels together. However, the complexity of interactions between host and kernel code makes this(More)
  • 1