• Corpus ID: 61579299

Electronic Structure Methods: The Tensor Contraction Engine ⁄

  title={Electronic Structure Methods: The Tensor Contraction Engine ⁄},
  author={Alexander A. Auer and Gerald Baumgartner and David E. Bernholdt and Alina Bibireata and Daniel Cociorva and Xiaoyang Gao and Robert J. Harrison and Sriram Krishnamoorthy and Sandhya Krishnan and Chi-Chung Lam and Qingda Lu and Marcel Nooijen and Russell M. Pitzer and J. Ramanujam and P. Sadayappan and Alexander Sibiryakov},
Format abstraction for sparse tensor algebra compilers
An interface that describes formats in terms of their capabilities and properties is developed, and a modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations.
Sparse Tensor Algebra Optimizations with Workspaces
This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for
Automatic Generation of Sparse Tensor Kernels with Workspaces
This work describes a compiler optimization called operator splitting that breaks up tensor sub-computations by introducing workspaces and shows that it increases the performance of important generated tensor kernels to match hand-optimized code.


Applied software measurement: assuring productivity and quality
This second edition fully discusses software metrics in relation to areas of acute interest today, with examples rooted in real-life case studies, with stat newly culled from more than 6,000 corporate and government projects.
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization
This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures.
Global Arrays: a portable "shared-memory" programming model for distributed memory computers
The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes.
Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines
A framework for optimization of computational cost and communication cost has been developed, that can be used to synthesize efficient code.
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations that combines loop fusion with loop tiling and uses a performance-model driven approach toloop tiling for the generation of out- of-corecode.
Computers and Intractability: A Guide to the Theory of NP-Completeness
Horn formulae play a prominent role in artificial intelligence and logic programming. In this paper we investigate the problem of optimal compression of propositional Horn production rule knowledge
Memory-Constrained Communication Minimization for a Class of Array Computations
An approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit is developed.
Raising the Level of Programming Abstraction in Scalable Programming Models
This paper presents two distinctly different approaches to raising the level of abstraction of the programming model while maintaining or increasing performance: the Tensor Contraction engine, a narrowly-focused domain specific language together with an optimizing compiler; and Extended Global Arrays, a programming framework that integrates programming models dealing with different layers of the memory/storage hierarchy using compiler analysis and code transformation techniques.