# Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models

@article{Baumgartner2005SynthesisOH, title={Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models}, author={Gerald Baumgartner and Alexander A. Auer and David E. Bernholdt and Alina Bibireata and Venkatesh Choppella and Daniel Cociorva and Xiaoyang Gao and Robert J. Harrison and So Hirata and Sriram Krishnamoorthy and Sandhya Krishnan and Chi-Chung Lam and Qingda Lu and Marcel Nooijen and Russell M. Pitzer and J. Ramanujam and P. Sadayappan and Alexander Sibiryakov}, journal={Proceedings of the IEEE}, year={2005}, volume={93}, pages={276-292} }

This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel code tailored to the characteristics of the target architecture. Several components of the synthesis system are described, focusing on…

## Figures and Tables from this paper

## 187 Citations

An Infrastructure for Scalable Parallel Programs for Computational Chemistry

- Computer Science
- 2008

The Super Instruction Architecture (SIA) is described and its application to the implementation of algorithms for electronic structure computational chemistry calculations and the methods are programmed in a domain specific programming language called super instruction assembly language (SIAL), which is based on SIAL.

A Task-based Execution Model for Coupled Cluster Methods

- Computer Science
- 2014

Many-body systems, such as those simulated by the Coupled Cluster methods of the Quantum Chemistry package NWChem, are both computationally intensive and of interest to the Computational Chemistry community.

Refactoring a language for parallel computational chemistry

- Computer ScienceWRT '08
- 2008

We describe a project to provide refactoring support for the SIAL programming language. SIAL is a domain specific parallel programing language designed to express quantum chemistry computations. It…

Toward generalized tensor algebra for ab initio quantum chemistry methods

- Computer ScienceARRAY@PLDI
- 2019

This work presents an algebra to specify and perform tensor operations on a larger class of block-sparse tensors, and illustrates the use of this framework in expressing real-world computational chemistry calculations beyond the reach of existing frameworks.

A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large Arrays

- Computer Science2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
- 2010

A parallel programming environment, the Super Instruction Architecture (SIA) comprising a domain specific programming language SIAL and its runtime system SIP that are specialized for this class of problems, where programmers express algorithms in terms of operations on blocks rather than individual floating point numbers.

Performance modeling and optimization of parallel out-of-core tensor contractions

- Computer SciencePPOPP
- 2005

A performance model for tensor contractions is developed, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain.

Symbolic Algebra in Quantum Chemistry

- Physics
- 2006

New algorithms that automate the algebraic transformation and computer implementation of many-body quantum-mechanical methods for electron correlation enable a whole new class of highly complex but vastly accurate methods.

Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

- Computer Science18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
- 2004

A Domain-Specific Compiler for Linear Algebra Operations

- Computer ScienceVECPAR
- 2012

A prototypical linear algebra compiler that automatically exploits domain-specific knowledge to generate high-performance algorithms that outperform the best existing libraries is presented.

Complier Techniques for Efficient Parallelization of Out-of-Core Tensor Contractions

- Computer Science
- 2005

A performance model for tensor contractions is developed, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain.

## References

SHOWING 1-10 OF 97 REFERENCES

Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

- Computer ScienceHiPC
- 2001

This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures.

General atomic and molecular electronic structure system

- ChemistryJ. Comput. Chem.
- 1993

A description of the ab initio quantum chemistry package GAMESS, which can be treated with wave functions ranging from the simplest closed‐shell case up to a general MCSCF case, permitting calculations at the necessary level of sophistication.

Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines

- Computer SciencePPSC
- 1997

A framework for optimization of computational cost and communication cost has been developed, that can be used to synthesize efficient code.

Memory-Constrained Communication Minimization for a Class of Array Computations

- Computer ScienceLCPC
- 2002

An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms

- Computer ScienceHiPC
- 2003

This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations that combines loop fusion with loop tiling and uses a performance-model driven approach toloop tiling for the generation of out- of-corecode.

Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals

- Computer ScienceLCPC
- 1999

This paper proposes an algorithm for finding a loop fusion configuration that minimizes memory usage and shows the performance improvement obtained by the algorithm on an electronic structure computation.

Global communication optimization for tensor contraction expressions under memory constraints

- Computer ScienceProceedings International Parallel and Distributed Processing Symposium
- 2003

An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.

Automatically Tuned Linear Algebra Software

- Computer ScienceProceedings of the IEEE/ACM SC98 Conference
- 1998

An approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units using the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS).

On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution

- Computer ScienceParallel Process. Lett.
- 1997

This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application and a pruning search strategy for determination of an optimal form is developed.

Loop optimization for a class of memory-constrained computations

- Computer ScienceICS '01
- 2001

This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays, with the objective of minimizing cache misses while keeping the total memory usage within a given limit.