# The tensor algebra compiler

@article{Kjolstad2017TheTA, title={The tensor algebra compiler}, author={Fredrik Kjolstad and S. Kamil and Stephen Chou and David Lugato and Saman P. Amarasinghe}, journal={Proceedings of the ACM on Programming Languages}, year={2017}, volume={1}, pages={1 - 29} }

Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them… Expand

#### Supplemental Videos

#### Figures, Tables, and Topics from this paper

#### Paper Mentions

#### 152 Citations

Taco: A tool to generate tensor algebra kernels

- Computer Science
- 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2017

Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences and to support programmers the authors have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. Expand

A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR

- Computer Science
- ArXiv
- 2021

The results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement, for parallel SpMV, SpMM, and TTM over TACO, respectively. Expand

Automatic Generation of Sparse Tensor Kernels with Workspaces

- Computer Science
- ArXiv
- 2018

This work describes a compiler optimization called operator splitting that breaks up tensor sub-computations by introducing workspaces and shows that it increases the performance of important generated tensor kernels to match hand-optimized code. Expand

A Tensor Algebra Compiler library interface and runtime

- Computer Science
- 2019

A new API for the taco library is presented, which removes the need to call compiler methods with the introduction of a delayed execution framework and introduces multiple important tensor algebra features previously unavailable in taco. Expand

ExTensor: An Accelerator for Sparse Tensor Algebra

- Computer Science
- MICRO
- 2019

The ExTensor accelerator is proposed, which builds these novel ideas on handling sparsity into hardware to enable better bandwidth utilization and compute throughput and evaluated on several kernels relative to industry libraries and state-of-the-art tensor algebra compilers. Expand

Tensor Algebra Compilation with Workspaces

- Computer Science
- 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
- 2019

The results show that the workspace transformation brings the performance of these kernels on par with hand-optimized implementations, and enables generating sparse matrix multiplication and MTTKRP with sparse output, neither of which were supported by prior tensor algebra compilers. Expand

Sparse Tensor Algebra Optimizations with Workspaces

- Computer Science
- 2018

This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for… Expand

Tensor Relational Algebra for Distributed Machine Learning System Design

- Computer Science
- Proc. VLDB Endow.
- 2021

The TRA is a set-based algebra based on the relational algebra that is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Expand

A Sparse Tensor Benchmark Suite for CPUs and GPUs

- Computer Science
- 2020 IEEE International Symposium on Workload Characterization (IISWC)
- 2020

A set of reference tensor kernel implementations that are compatible with real-world tensors and power law tensors extended from synthetic graph generation techniques are presented and Roofline performance models for these kernels are proposed to provide insights of computer platforms from sparse tensor view. Expand

A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra

- Computer Science
- ArXiv
- 2020

This work shows that standard loop transformations, such as strip-mining, tiling, collapsing, parallelization and vectorization, can be applied to irregular loops over sparse iteration spaces and generates good code that is competitive with many hand-optimized implementations from the literature. Expand

#### References

SHOWING 1-10 OF 99 REFERENCES

Sparse Tensor Algebra as a Parallel Programming Model

- Computer Science
- ArXiv
- 2015

This work extends the usual basic operations of tensor summation and contraction to arbitrary functions, and further operations such as reductions and mapping, and shows how key graph algorithms as well as common numerical kernels can be succinctly expressed using this interface and provide performance results of a general library implementation. Expand

Sparso: Context-driven optimizations of sparse linear algebra

- Computer Science
- 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)
- 2016

In Sparso, a compiler and sparse linear algebra libraries collaboratively discover and exploit context, which is defined as the invariant properties of matrices and relationships between them in a program, to discover the context and drive key optimizations across library routines and matrices. Expand

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

- Computer Science
- 2015 IEEE International Parallel and Distributed Processing Symposium
- 2015

Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is a… Expand

High-Performance Tensor Contraction without Transposition

- Computer Science
- SIAM J. Sci. Comput.
- 2018

This work implements TC using the flexible BLAS-like Instantiation Software (BLIS) framework, which allows for transposition (reshaping) of the tensor to be fused with internal partitioning and packing operations, requiring no explicit transposition operations or additional workspace. Expand

Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures

- Computer Science
- 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3)
- 2016

The optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms is presented, which is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. Expand

Efficient and scalable computations with sparse tensors

- Computer Science
- 2012 IEEE Conference on High Performance Extreme Computing
- 2012

This paper describes new sparse tensor storage formats that provide storage benefits and are flexible and efficient for performing tensor computations and proposes an optimization that improves data reuse and reduces redundant or unnecessary computations in tensor decomposition algorithms. Expand

Optimization of symmetric tensor computations

- Computer Science
- 2015 IEEE High Performance Extreme Computing Conference (HPEC)
- 2015

Novel optimizations that exploit the symmetry in tensors in order to reduce redundancy in computations and storage and effectively parallelize operations involving symmetric tensors are described. Expand

Data-Parallel Language for Correct and Efficient Sparse Matrix Codes

- Computer Science
- 2011

LL, a small functional language suitable for implementing operations on sparse matrices, is presented, and a compiler for LL programs that generates efficient, parallel C code is described, which facilitates a straightforward, syntax-directed translation of code. Expand

Tensor-matrix products with a compressed sparse tensor

- Computer Science
- IA3@SC
- 2015

The compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensor-matrix multiplication is introduced and offers similar operation reductions as existing compressed methods while using only a single tensor structure. Expand

OSKI: A Library of Automatically Tuned Sparse Matrix Kernels

- Computer Science
- 2005

An overview of OSKI is provided, which is based on research on automatically tuned sparse kernels for modern cache-based superscalar machines, and the primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine. Expand