• Corpus ID: 18484505

Polly's Polyhedral Scheduling in the Presence of Reductions

@article{Doerfert2015PollysPS,
  title={Polly's Polyhedral Scheduling in the Presence of Reductions},
  author={Johannes Doerfert and Kevin Streit and Sebastian Hack and Zino Benaissa},
  journal={ArXiv},
  year={2015},
  volume={abs/1505.07716}
}
The polyhedral model provides a powerful mathematical abstraction to enable eective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions… 

Figures and Tables from this paper

Polyhedral expression propagation
TLDR
This paper presents a technique that statically propagates expressions in order to avoid communicating their result via memory, which outperforms a state-of-the-art polyhedral optimization especially designed for this kind of programs by a factor of up to 2.03×.
Polygeist: Raising C to Polyhedral MLIR
TLDR
The Polygeist/MLIR intermediate representation featuring high-level (affine) loop constructs and n-D arrays embedded into a single static assignment (SSA) substrate enables an unprecedented combination of SSA-based and polyhedral optimizations.
Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs
TLDR
This work presents language constructs that let a programmer express arbitrary reductions on user-defined data types matching the performance of tuned library implementations, and extends a polyhedral compilation flow to process these user- defined reductions.
Simplifying Multiple-Statement Reductions with the Polyhedral Model
TLDR
This work identifies and formalizes the multiple\-/statement reduction problem as a bilinear optimization problem, and presents a heuristic optimization algorithm for these reductions, and demonstrates that the algorithm provides optimal complexity for a set of benchmark programs from the literature on probabilistic inference algorithms, whose performance critically relies on simplifying these reductions.
Scheduling and Tiling Reductions on Realistic Machines
TLDR
The techniques presented in Gupta et al., identify a potential issue in their scheduling algorithm and provide a solution, and demonstrate how these scheduling techniques can be extended to "tile" reductions and briefly survey other studies that address the problem of scheduling reductions.
md_poly: A Performance-Portable Polyhedral Compiler Based on Multi-Dimensional Homomorphisms
1 Motivation Programming state-of-the-art parallel architectures such as multi-core CPU and many-core GPU is challenging. For high performance, the programmer has to optimize its source code for the
Discovery and exploitation of general reductions: A constraint based approach
TLDR
A new compiler based approach that automatically detects a wide class of reductions and exploitation of histograms is developed, based on a constraint formulation of the reduction idiom and has been implemented as an LLVM pass.
Application-Specific Arithmetic in High-Level Synthesis Tools
TLDR
This work studies hardware-specific optimization opportunities currently unexploited by high-level synthesis compilers and prototyped in the GeCoS source-to-source compiler and evaluated on the Polybench and EEMBC benchmark suites.
An Interval Compiler for Sound Floating-Point Computations
TLDR
IGen, a source-to-source compiler that translates a given C function using floating-point into an equivalent sound C function that uses interval arithmetic is presented, showing that the generated code delivers sound double precision results at high performance.
Automatically harnessing sparse acceleration
TLDR
A new approach based on the LiLAC specification, requiring the application developer to (re)write every program for a given library, the burden is shifted to a one-off description by the library implementer.
...
1
2
3
...

References

SHOWING 1-10 OF 38 REFERENCES
Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation
TLDR
Polly is presented, an infrastructure for polyhedral optimizations on the compiler's internal, low-level, intermediate representation (IR) and an interface for connecting external optimizers and a novel way of using the parallelism they introduce to generate SIMD and OpenMP code is presented.
The Polyhedral Model Is More Widely Applicable Than You Think
TLDR
This work concentrates on extending the code generation step and does not compromise the expressiveness of the model, presenting experimental evidence that the extension is relevant for program optimization and parallelization, showing performance improvements on benchmarks that were thought to be out of reach of the polyhedral model.
Scheduling reductions
TLDR
A scheduling method based on the algorithms from [Fea92a, Fea92b] which works in presence of reductions is presented and it is shown that side-effects of reductions scheduling are the simplification of the scheduling process and the improvement of the computed schedules.
Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model
TLDR
This work proposes an automatic transformation framework to optimize arbitrarily-nested loop sequences with affine dependences for parallelism and locality simultaneously and finds good tiling hyperplanes by embedding a powerful and versatile cost function into an Integer Linear Programming formulation.
When polyhedral transformations meet SIMD code generation
TLDR
This work defines the concept of vectorizable codelets, with properties tailored to achieve effective SIMD code generation for the codelet, and uses the power of a modern high-level transformation framework to restructure a program to expose good ISA-independent vectorizable codes, exploiting multi-dimensional data reuse.
Scan detection and parallelization in "inherently sequential" nested loop programs
TLDR
This work presents a method for automatically parallelizing "inherently sequential" programs, which handles arbitrarily nested loops, and identifies situations where the computation performed by the loop body is equivalent to a matrix vector product over a semi-ring.
Scheduling reductions on realistic machines
TLDR
An algorithm is developed to determine efficient serializations of all reductions in systems of affine recurrence equations over polyhedral domains.
Static analysis of upper and lower bounds on dependences and parallelism
TLDR
A two-step approach to the search for parallelism in sequential programs is proposed, which lets us distinguish inherently sequential code from code that contains unexploited parallelism and produces information about the kinds of transformations needed to parallelize the code, without worrying about the order of application of the transformations.
isl: An Integer Set Library for the Polyhedral Model
In compiler research, polytopes and related mathematical objects have been successfully used for several decades to represent and manipulate computer programs in an approach that has become known as
A framework for enhancing data reuse via associative reordering
TLDR
It is shown how stencil operations can be implemented to better exploit register reuse and reduce load/stores and a multi-dimensional retiming formalism is developed to characterize the space of valid implementations in conjunction with other program transformations.
...
1
2
3
4
...