• Corpus ID: 123887275

Integrated compiler optimizations for tensor contractions

  title={Integrated compiler optimizations for tensor contractions},
  author={P. Sadayappan and Xiaoyang Gao},
This dissertation addresses several performance optimization issues in the context of the Tensor Contraction Engine (TCE), a domain-specific compiler to synthesize parallel, out-of-core programs for a class of scientific computations encountered in computational chemistry and physics. The domain of our focus is electronic structure calculations, where many computationally intensive components are expressible as a set of tensor contractions. These scientific applications are extremely compute… 


Global communication optimization for tensor contraction expressions under memory constraints
An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization
This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures.
Complier Techniques for Efficient Parallelization of Out-of-Core Tensor Contractions
A performance model for tensor contractions is developed, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain.
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions
Novel pruning strategies are developed whereby a search problem in a larger space is replaced by a large number of searches in a much smaller space, to determine the optimal permutation, fusion, tiling and placement of disk I/O statements.
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations that combines loop fusion with loop tiling and uses a performance-model driven approach toloop tiling for the generation of out- of-corecode.
Compilation Techniques for Out-of-Core Parallel Computations
Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations
This paper develops an effective heuristic approach to the operation minimization problem, and demonstrates its effectiveness on tensor contraction expressions for coupled cluster equations.
Loop optimization for a class of memory-constrained computations
This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays, with the objective of minimizing cache misses while keeping the total memory usage within a given limit.
Compiler support for out-of-core arrays on parallel machines
In general, the compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fortran D's DECOMPOSITION, ALIGN, and DISTRIBUTE statements.
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations
A unified framework that optimizes out-of-core programs by exploiting locality and parallelism, and reducing communication overhead, and extending the base algorithm to work with file layout constraints and show how it is useful for optimizing programs that consist of multiple loop nests.