Learn More
Tensor contractions are extremely compute intensive generalized matrix multiplication operations encountered in many computational science fields, such as quantum chemistry and nuclear physics. Unlike distributed matrix multiplication, which has been extensively studied, limited work has been done in understanding distributed tensor contractions. In this(More)
In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred(More)
Tensor contractions represent the most compute- intensive core kernels in ab initio computational quantum chemistry and nuclear physics. Symmetries in these tensor contractions make them difficult to load balance and scale to large distributed systems. In this paper, we develop an efficient and scalable algorithm to contract symmetric tensors. We introduce(More)
Loop fusion is a key program transformation for data locality optimization that is implemented in production compilers. But optimizing compilers for imperative languages currently cannot ex- ploit fusion opportunities across a set of recursive tree traversal computations with producer-consumer relationships. In this paper, we develop a compile-time approach(More)
Convolutional Neural Networks (CNN) are a class of Ar- tificial Neural Networks (ANN) that are highly efficient at the pattern recognition tasks that underlie difficult AI prob- lems in a variety of domains, such as speech recognition, object recognition, and natural language processing. CNNs are, however, computationally intensive to train. This paper(More)
The four-index integral transform is a fundamental and computationally demanding calculation used in many computational chemistry suites such as NWChem. It transforms a four-dimensional tensor from one basis to another. This transformation is most efficiently implemented as a sequence of four tensor contractions that each contract a four- dimensional tensor(More)
  • 1