Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

  title={Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code},
  author={Riyadh Baghdadi and Jessica Ray and Malek Ben Romdhane and Emanuele Del Sozzo and Abdurrahman Akkas and Yunming Zhang and Patricia Suriana and Shoaib Kamil and Saman P. Amarasinghe},
  journal={2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)},
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel commands to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based… 

Figures and Tables from this paper

Stripe: Tensor Compilation via the Nested Polyhedral Model

This model provides an underlying framework for an intermediate representation (IR) called Stripe, amenable to standard compiler techniques while naturally modeling key aspects of modern ML computing, which enables a compiler for ML in the style of LLVM that allows independent development of algorithms, optimizations, and hardware accelerators.

Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection

This work introduces the constraint tree abstraction which may be generated by a non-linear optimizer and injected to the polyhedral optimization process to build better solutions and presents how to benefit from such a mechanism to generate efficient codes for GPU in the context of AI/DL operators.

POSTER: A Polyhedral+Dataflow Intermediate Language for Performance Exploration

  • Eddie C. DavisC. Olschanowsky
  • Computer Science
    2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)
  • 2019
A compiler intermediate language designed for dataflow optimizations within a polyhedral framework that enables this broad range of optimizations by allowing each layer to be transformed independently, while respecting dependences is introduced.

AKG: automatic kernel generation for neural processing units using polyhedral transformations

AKG is presented, a tensor compiler for NPUs that leverages polyhedral schedulers to perform a much wider class of transformations, and extends the semantics of the polyhedral representation to combine complex tiling techniques and hierarchical fusion strategies.

Scalable Polyhedral Compilation, Syntax vs. Semantics: 1–0 in the First Round

A family of techniques called offline statement clustering, which integrates transparently into the flow of a state-of-the-art polyhedral compiler and can reduce the scheduling time by a factor of 6 without inducing a significant loss in optimization opportunities is introduced.

Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques

A high-level polyhedral compiler that takes a high- level algorithm description that can be written manually or extracted from a TensorFlow computation graph and generates input to the low-level C-based compiler.

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Fireiron is introduced, a DSL and compiler which allows the specification of high-performance GPU implementations as compositions of simple and reusable building blocks, and shows how to use Fireiron to optimize matrix multiplication implementations, achieving performance matching hand-coded CUDA kernels, even when using specialised hardware.

Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms

This work develops a novel code generation approach based on a generic OpenCL implementation that efficiently exploits the OpenCL's abstract platform and memory model, generically for arbitrary MDH functions, by incorporating a parameterized parallelization and tiling strategy – on both layers of the Opencl's two models and in all dimensions of the multi-dimensional input.

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

Comp compiler algorithms to automatically generate high performance implementations of DL primitives that closely match the performance of hand optimized libraries and a flexible framework where it is possible to plug in library implementations of the same in lieu of a subset of the loops.

Towards a Domain-Extensible Compiler: Optimizing an Image Processing Pipeline on Mobile CPUs

  • T. KoehlerMichel Steuwer
  • Computer Science
    2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
  • 2021
This paper shows how to extend a unifying domain-extensible compiler with domain-specific as well as hardware-specific optimizations with results that show that the code generated for the Harris operator outperforms the image processing library OpenCV by up to 16× and achieves performance close to - or even up to 1.4 × better than - the state-of-the-art image processing compiler Halide.



Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation

Polly is presented, an infrastructure for polyhedral optimizations on the compiler's internal, low-level, intermediate representation (IR) and an interface for connecting external optimizers and a novel way of using the parallelism they introduce to generate SIMD and OpenMP code is presented.

A practical automatic polyhedral parallelizer and locality optimizer

An automatic polyhedral source-to-source transformation framework that can optimize regular programs for parallelism and locality simultaneously simultaneously and is implemented into a tool to automatically generate OpenMP parallel code from C program sections.

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

PENCIL, a rigorously-defined subset of GNU C99-enriched with additional language constructs-that enables compilers to exploit parallelism and produce highly optimized code when targeting accelerators, is presented.

Code generation in the polyhedral model is easier than you think

  • C. Bastoul
  • Computer Science
    Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004.
  • 2004
A general transformation framework able to deal with nonunimodular, noninvertible, nonintegral or even nonuniform functions is discussed and several improvements to a state-of-the-art code generation algorithm are presented.

Decoupling algorithms from schedules for easy optimization of image processing pipelines

This work proposes a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity, and demonstrates the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide and compiling them for ARM, x86, and GPUs.

CHiLL : A Framework for Composing High-Level Loop Transformations

A general and robust loop transformation framework that enables compilers to generate efficient code on complex loop nests and shows performance results on automaticallygenerated code for the Pentium M and MIPS R10000 that are comparable to the best hand-tuned codes, and significantly better than the native compilers.

Distributed Halide

This work presents an extension to Halide to support distributed-memory parallel execution of complex stencil pipelines, allowing expression of complex computation and communication strategies.

PolyMage: Automatic Optimization for Image Processing Pipelines

This is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically and is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler forimage processing pipelines.

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.

Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies

This work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler, using a semi-automatic optimization approach to demonstrate that current compilers suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework.