Memory-Constrained Data Locality Optimization for Tensor Contractions
@inproceedings{Bibireata2003MemoryConstrainedDL, title={Memory-Constrained Data Locality Optimization for Tensor Contractions}, author={Alina Bibireata and Sandhya Krishnan and Gerald Baumgartner and Daniel Cociorva and Chi-Chung Lam and P. Sadayappan and J. Ramanujam and David E. Bernholdt and Venkatesh Choppella}, booktitle={LCPC}, year={2003} }
The accurate modeling of the electronic structure of atoms and molecules involves computationally intensive tensor contractions over large multi-dimensional arrays. Efficient computation of these contractions usually requires the generation of temporary intermediate arrays. These intermediates could be extremely large, requiring their storage on disk. However, the intermediates can often be generated and used in batches through appropriate loop fusion transformations. To optimize the…
17 Citations
Model-driven search-based loop fusion optimization for handwritten code
- Computer Science
- 2008
This thesis shows how to apply the loop fusion algorithm to handwritten code in a procedural language and outlines how the constraints on loop bounds expressions and array index expressions could be removed in the future using an algebraic cost model and an analysis of the iteration space using a polyhedral model.
MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation
- Computer ScienceACM Trans. Archit. Code Optim.
- 2022
This article proposes MemHC, an optimized systematic GPU memory management framework that aims to accelerate the calculation of many-body correlation functions utilizing a series of new memory reduction designs.
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions
- Computer ScienceJ. Parallel Distributed Comput.
- 2012
Automatic code generation for many-body electronic structure methods: the tensor contraction engine
- Computer Science
- 2006
An overview of the Tensor Contraction Engine (TCE), a unique effort to address issues of both productivity and performance through automatic code generation that acts like an optimizing compiler.
Automatic transformation and optimization of applications on gpus and gpu clusters
- Computer Science
- 2011
An auto-tuning framework which selects algorithms and parameters according to some cost model and thresholds extracted from simple micro-benchmarks is developed, and a loop transformation system in the environment of multi-level memory hierarchy is developed.
Out-of-Core Computations of High-Resolution Level Sets by Means of Code Transformation
- Computer ScienceJ. Sci. Comput.
- 2012
A storage efficient, fast and parallelizable out-of-core framework for streaming computations of high resolution level sets which allows for the combination of interface propagation, re-normalization and narrow-band rebuild into a single pass over the data stored on disk.
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models
- Computer Science, ChemistryProceedings of the IEEE
- 2005
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations, expressible as a set of tensor contractions and arise in electronic structure modeling.
Memory-optimal evaluation of expression trees involving large objects
- Computer ScienceComput. Lang. Syst. Struct.
- 1999
A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry
- Computer ScienceACM/IEEE SC 2002 Conference (SC'02)
- 2002
This paper discusses an approach to the synthesis of high-performance parallel programs for a class of computations encountered in quantum chemistry and physics. These computations are expressible as…
Symbolic Algebra in Quantum Chemistry
- Physics
- 2006
New algorithms that automate the algebraic transformation and computer implementation of many-body quantum-mechanical methods for electron correlation enable a whole new class of highly complex but vastly accurate methods.
References
SHOWING 1-10 OF 19 REFERENCES
Global communication optimization for tensor contraction expressions under memory constraints
- Computer ScienceProceedings International Parallel and Distributed Processing Symposium
- 2003
An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.
Loop optimization for a class of memory-constrained computations
- Computer ScienceICS '01
- 2001
This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays, with the objective of minimizing cache misses while keeping the total memory usage within a given limit.
Space-time trade-off optimization for a class of electronic structure calculations
- Computer SciencePLDI '02
- 2002
An algorithm is presented that starts with an operation-minimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost that fits within a specified memory limit.
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization
- Computer ScienceHiPC
- 2001
This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures.
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
- Computer ScienceHiPC
- 2003
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations that combines loop fusion with loop tiling and uses a performance-model driven approach toloop tiling for the generation of out- of-corecode.
On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution
- Computer ScienceParallel Process. Lett.
- 1997
This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application and a pruning search strategy for determination of an optimal form is developed.
Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensiona l Integrals
- Computer Science
- 1999
This paper proposes algorithms for finding loop fusion configurations that minimize memory usage under static and dynamic memory allocation models, and suggests ways to further reduce memory usage, when necessary, at the cost of increased arithmetic operations.
Performance optimization of a class of loops implementing multidimensional integrals
- Computer Science
- 1999
This thesis addresses the performance optimization of a class of loops that implement multi-dimensional summations and enhances the solutions to the various optimization problems to address the practically significant issues of sparsity, use of fast Fourier transforms, and utilization of common sub-expressions.
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals
- Computer ScienceLCPC
- 1999
This paper proposes an algorithm for finding a loop fusion configuration that minimizes memory usage and shows the performance improvement obtained by the algorithm on an electronic structure computation.
Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines
- Computer SciencePPSC
- 1997
A framework for optimization of computational cost and communication cost has been developed, that can be used to synthesize efficient code.