Corpus ID: 5030193

Autotuning OpenCL Workgroup Size for Stencil Patterns

  title={Autotuning OpenCL Workgroup Size for Stencil Patterns},
  author={Chris Cummins and Pavlos Petoumenos and Michel Steuwer and H. Leather},
  • Chris Cummins, Pavlos Petoumenos, +1 author H. Leather
  • Published 2015
  • Computer Science
  • ArXiv
  • Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requires knowledge of the underlying hardware, the data being operated on, and the implementation of the kernel. This makes portable performance of OpenCL programs a challenging goal, since simple heuristics and statically chosen values fail to exploit the available performance. To address this, we propose the use of machine learning-enabled autotuning to automatically predict workgroup sizes for… CONTINUE READING
    23 Citations
    Efficient and Portable Workgroup Size Tuning
    • C. Yu, S. Tsao
    • Computer Science
    • IEEE Transactions on Parallel and Distributed Systems
    • 2020
    • Highly Influenced
    A Sampling Based Strategy to Automatic Performance Tuning of GPU Programs
    • W. Feng, T. Abdelrahman
    • Computer Science
    • 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
    • 2017
    • 4
    Use of Synthetic Benchmarks for Machine-Learning-Based Performance Auto-Tuning
    • T. D. Han, T. Abdelrahman
    • Computer Science
    • 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
    • 2017
    • 6
    Minimizing the cost of iterative compilation with active learning
    • 40
    • PDF
    Deep Learning for Compilers
    • 3
    Reducing the Cost of Heuristic Generation with Machine Learning


    Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability
    • 28
    • PDF
    CLTune: A Generic Auto-Tuner for OpenCL Kernels
    • C. Nugteren, V. Codreanu
    • Computer Science
    • 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip
    • 2015
    • 58
    • PDF
    A case for machine learning to optimize multicore performance
    • 94
    An auto-tuning framework for parallel multicore stencil computations
    • 231
    • PDF
    PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
    • 302
    • PDF
    Raced profiles: efficient selection of competing compiler optimizations
    • 25
    • PDF
    High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library
    • 14
    • PDF
    PARTANS: An autotuning framework for stencil computation on multi-GPU systems
    • 69
    • PDF
    Auto-generation and auto-tuning of 3D stencil codes on GPU clusters
    • 111
    • PDF
    SkelCL - A Portable Skeleton Library for High-Level GPU Programming
    • 129
    • PDF