• Publications
  • Influence
Learning to optimize halide with tree search and random programs
TLDR
This work presents a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning that produces schedules which are on average almost twice as fast as the existing Halide autoscheduler without autotuning, or more than two as fast with, and is the first automatic scheduling algorithm to significantly outperform human experts on average.
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
TLDR
Tiramisu introduces a scheduling language with novel commands to explicitly manage the complexities that arise when targeting these systems, designed for the areas of image processing, stencils, linear algebra and deep learning.
GraphIt: a high-performance graph DSL
TLDR
GraphIt is introduced, a new DSL for graph computations that generates fast implementations for algorithms with different performance characteristics running on graphs with different sizes and structures and which outperforms the next fastest shared-memory frameworks on 24 out of 32 experiments.
PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming
TLDR
PENCIL, a rigorously-defined subset of GNU C99-enriched with additional language constructs-that enables compilers to exploit parallelism and produce highly optimized code when targeting accelerators, is presented.
VOBLA: a vehicle for optimized basic linear algebra
TLDR
VOBLA is compiled to PENCIL, a domain independent intermediate language designed for efficient mapping to accelerator architectures such as GPGPUs, and the performance of OpenCL code generated using the compilation flow on ARM Mali, AMD Radeon, and AMD Opteron platforms is evaluated.
PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs
We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming.
Tiramisu: A Code Optimization Framework for High Performance Systems
TLDR
Tiramisu is introduced, an optimization framework designed to generate efficient code for high-performance systems such as multicores, GPUs, FPGAs, distributed machines, or any combination of these, and introduces a novel four-level IR that allows full separation between algorithms, schedules, data-layouts and communication.
Improved loop tiling based on the removal of spurious false dependences
TLDR
A compilation technique to safely ignore a large number of false dependences in order to enable loop nest tiling in the polyhedral model is proposed and evaluated, based on the precise characterization of interferences between live range intervals.
A Deep Learning Based Cost Model for Automatic Code Optimization
TLDR
A novel deep learning based cost model for automatic code optimization that enables TIRAMISU to automatically find code transformations that match or are better than state-of-the-art compilers without requiring the same level of heavy feature engineering required by those compilers.
PENCIL Language Specification
TLDR
PENCIL is presented, a rigorously-defined subset of GNU C99 with specific programming rules and few extensions that enable compilers to exploit parallelism and to better optimize code when targeting accelerators.
...
1
2
3
...