# Automatic code generation for many-body electronic structure methods: the tensor contraction engine

@article{Auer2006AutomaticCG, title={Automatic code generation for many-body electronic structure methods: the tensor contraction engine}, author={Alexander A. Auer and Gerald Baumgartner and David E. Bernholdt and Alina Bibireata and Venkatesh Choppella and Daniel Cociorva and Xiaoyang Gao and Robert J. Harrison and Sriram Krishnamoorthy and Sandhya Krishnan and Chi-Chung Lam and Qingda Lu and Marcel Nooijen and Russell M. Pitzer and J. Ramanujam and P. Sadayappan and Alexander Sibiryakov}, journal={Molecular Physics}, year={2006}, volume={104}, pages={211 - 228} }

As both electronic structure methods and the computers on which they are run become increasingly complex, the task of producing robust, reliable, high-performance implementations of methods at a rapid pace becomes increasingly daunting. In this paper we present an overview of the Tensor Contraction Engine (TCE), a unique effort to address issues of both productivity and performance through automatic code generation. The TCE is designed to take equations for many-body methods in a convenient…

## 109 Citations

Generating Efficient Quantum Chemistry Codes for Novel Architectures.

- Computer ScienceJournal of chemical theory and computation
- 2013

It is suggested that the meta-programming and empirical performance optimization approach may be important in future computational chemistry applications, especially in the face of quickly evolving computer architectures.

A Code Generator for High-Performance Tensor Contractions on GPUs

- Computer Science2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
- 2019

A high-performance GPU code generator for arbitrary tensor contractions that exploits domain-specific properties about data reuse in tensorcontractions to devise an effective code generation schema and determine parameters for mapping of computation to threads and staging of data through the GPU memory hierarchy.

A case study in mechanically deriving dense linear algebra code

- Computer ScienceInt. J. High Perform. Comput. Appl.
- 2013

This paper uses DxT to derive the implementation of a representative matrix operation, two- sided Trmm, using a knowledge base of transformations that were encoded for a simpler set of operations, the level-3 BLAS, and adding only a few transformations to accommodate the more complex two-sided Trmm.

Format abstraction for sparse tensor algebra compilers

- Computer ScienceProc. ACM Program. Lang.
- 2018

An interface that describes formats in terms of their capabilities and properties is developed, and a modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations.

Taco: A tool to generate tensor algebra kernels

- Computer Science2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2017

Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences and to support programmers the authors have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions.

Expression Tree Evaluation by Dynamic Code Generation - Are Accelerators Up for the Task?

- Computer Science2013 42nd International Conference on Parallel Processing
- 2013

The need that coming HPC systems still have to be equipped with a significant portion of latency-oriented, thus complex general-purpose hardware is seen, and the benefit of accelerators for this scenario is researched.

Optimizing tensor contraction expressions for hybrid CPU-GPU execution

- Computer ScienceCluster Computing
- 2011

This paper presents the approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU, and provides several effective optimization algorithms.

The tensor algebra compiler

- Computer ScienceProc. ACM Program. Lang.
- 2017

The first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors is introduced, which is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.

AutoHOOT: Automatic High-Order Optimization for Tensors

- Computer SciencePACT
- 2020

This work introduces AutoHOOT, the first automatic differentiation framework targeting at high-order optimization for tensor computations, which contains a new explicit Jacobian / Hessian expression generation kernel whose outputs maintain the input tensors' granularity and are easy to optimize.

Generatively Programming Galerkin Projections on General Purpose Graphics Processing Units

- Computer Science
- 2009

A performance improvement of almost an order of magnitude over a multicore CPU implementation for the Advection-Diffusion equation on typical hardware performing computations using double-precision arithmetic is demonstrated.

## References

SHOWING 1-10 OF 199 REFERENCES

Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

- Computer ScienceHiPC
- 2001

This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures.

Space-time trade-off optimization for a class of electronic structure calculations

- Computer SciencePLDI '02
- 2002

An algorithm is presented that starts with an operation-minimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost that fits within a specified memory limit.

Raising the Level of Programming Abstraction in Scalable Programming Models

- Computer Science
- 2004

This paper presents two distinctly different approaches to raising the level of abstraction of the programming model while maintaining or increasing performance: the Tensor Contraction engine, a narrowly-focused domain specific language together with an optimizing compiler; and Extended Global Arrays, a programming framework that integrates programming models dealing with different layers of the memory/storage hierarchy using compiler analysis and code transformation techniques.

Memory-Constrained Data Locality Optimization for Tensor Contractions

- Computer ScienceLCPC
- 2003

An optimization framework to search among a space of fusion and tiling choices to minimize the data movement overhead is developed and is demonstrated on a computation representative of a component used in quantum chemistry suites.

The automated solution of second quantization equations with applications to the coupled cluster approach

- Computer Science
- 1991

In this research a program has been written in the C programming language which can efficiently compute the quasivacuum expectation value of a product of creation and annihilation operators and scalar arrays and which has been applied to open-shell coupled cluster theory.

Memory-Constrained Communication Minimization for a Class of Array Computations

- Computer ScienceLCPC
- 2002

An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms

- Computer ScienceHiPC
- 2003

This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations that combines loop fusion with loop tiling and uses a performance-model driven approach toloop tiling for the generation of out- of-corecode.

On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution

- Computer ScienceParallel Process. Lett.
- 1997

This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application and a pruning search strategy for determination of an optimal form is developed.

Global arrays: A nonuniform memory access programming model for high-performance computers

- Computer ScienceThe Journal of Supercomputing
- 2004

The key concept of GAs is that they provide a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes.

Loop optimization for a class of memory-constrained computations

- Computer ScienceICS '01
- 2001

This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays, with the objective of minimizing cache misses while keeping the total memory usage within a given limit.