Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models

@article{Baumgartner2005SynthesisOH,
  title={Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models},
  author={Gerald Baumgartner and Alexander A. Auer and David E. Bernholdt and Alina Bibireata and Venkatesh Choppella and Daniel Cociorva and Xiaoyang Gao and Robert J. Harrison and So Hirata and Sriram Krishnamoorthy and Sandhya Krishnan and Chi-Chung Lam and Qingda Lu and Marcel Nooijen and Russell M. Pitzer and J. Ramanujam and P. Sadayappan and Alexander Sibiryakov},
  journal={Proceedings of the IEEE},
  year={2005},
  volume={93},
  pages={276-292}
}
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel code tailored to the characteristics of the target architecture. Several components of the synthesis system are described, focusing on… 

Figures and Tables from this paper

An Infrastructure for Scalable Parallel Programs for Computational Chemistry
TLDR
The Super Instruction Architecture (SIA) is described and its application to the implementation of algorithms for electronic structure computational chemistry calculations and the methods are programmed in a domain specific programming language called super instruction assembly language (SIAL), which is based on SIAL.
A Task-based Execution Model for Coupled Cluster Methods
TLDR
Many-body systems, such as those simulated by the Coupled Cluster methods of the Quantum Chemistry package NWChem, are both computationally intensive and of interest to the Computational Chemistry community.
Refactoring a language for parallel computational chemistry
We describe a project to provide refactoring support for the SIAL programming language. SIAL is a domain specific parallel programing language designed to express quantum chemistry computations. It
Toward generalized tensor algebra for ab initio quantum chemistry methods
TLDR
This work presents an algebra to specify and perform tensor operations on a larger class of block-sparse tensors, and illustrates the use of this framework in expressing real-world computational chemistry calculations beyond the reach of existing frameworks.
A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large Arrays
TLDR
A parallel programming environment, the Super Instruction Architecture (SIA) comprising a domain specific programming language SIAL and its runtime system SIP that are specialized for this class of problems, where programmers express algorithms in terms of operations on blocks rather than individual floating point numbers.
Performance modeling and optimization of parallel out-of-core tensor contractions
TLDR
A performance model for tensor contractions is developed, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain.
Symbolic Algebra in Quantum Chemistry
TLDR
New algorithms that automate the algebraic transformation and computer implementation of many-body quantum-mechanical methods for electron correlation enable a whole new class of highly complex but vastly accurate methods.
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
A Domain-Specific Compiler for Linear Algebra Operations
TLDR
A prototypical linear algebra compiler that automatically exploits domain-specific knowledge to generate high-performance algorithms that outperform the best existing libraries is presented.
Complier Techniques for Efficient Parallelization of Out-of-Core Tensor Contractions
TLDR
A performance model for tensor contractions is developed, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain.
...
...

References

SHOWING 1-10 OF 97 REFERENCES
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization
TLDR
This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures.
General atomic and molecular electronic structure system
TLDR
A description of the ab initio quantum chemistry package GAMESS, which can be treated with wave functions ranging from the simplest closed‐shell case up to a general MCSCF case, permitting calculations at the necessary level of sophistication.
Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines
TLDR
A framework for optimization of computational cost and communication cost has been developed, that can be used to synthesize efficient code.
Memory-Constrained Communication Minimization for a Class of Array Computations
TLDR
An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
TLDR
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations that combines loop fusion with loop tiling and uses a performance-model driven approach toloop tiling for the generation of out- of-corecode.
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals
TLDR
This paper proposes an algorithm for finding a loop fusion configuration that minimizes memory usage and shows the performance improvement obtained by the algorithm on an electronic structure computation.
Global communication optimization for tensor contraction expressions under memory constraints
TLDR
An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.
Automatically Tuned Linear Algebra Software
TLDR
An approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units using the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS).
On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution
TLDR
This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application and a pruning search strategy for determination of an optimal form is developed.
Loop optimization for a class of memory-constrained computations
TLDR
This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays, with the objective of minimizing cache misses while keeping the total memory usage within a given limit.
...
...