The tensor algebra compiler

@article{Kjolstad2017TheTA,
  title={The tensor algebra compiler},
  author={Fredrik Kjolstad and S. Kamil and Stephen Chou and David Lugato and Saman P. Amarasinghe},
  journal={Proceedings of the ACM on Programming Languages},
  year={2017},
  volume={1},
  pages={1 - 29}
}
Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them… Expand
Taco: A tool to generate tensor algebra kernels
TLDR
Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences and to support programmers the authors have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. Expand
A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR
TLDR
The results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement, for parallel SpMV, SpMM, and TTM over TACO, respectively. Expand
Automatic Generation of Sparse Tensor Kernels with Workspaces
TLDR
This work describes a compiler optimization called operator splitting that breaks up tensor sub-computations by introducing workspaces and shows that it increases the performance of important generated tensor kernels to match hand-optimized code. Expand
A Tensor Algebra Compiler library interface and runtime
TLDR
A new API for the taco library is presented, which removes the need to call compiler methods with the introduction of a delayed execution framework and introduces multiple important tensor algebra features previously unavailable in taco. Expand
ExTensor: An Accelerator for Sparse Tensor Algebra
TLDR
The ExTensor accelerator is proposed, which builds these novel ideas on handling sparsity into hardware to enable better bandwidth utilization and compute throughput and evaluated on several kernels relative to industry libraries and state-of-the-art tensor algebra compilers. Expand
Tensor Algebra Compilation with Workspaces
TLDR
The results show that the workspace transformation brings the performance of these kernels on par with hand-optimized implementations, and enables generating sparse matrix multiplication and MTTKRP with sparse output, neither of which were supported by prior tensor algebra compilers. Expand
Sparse Tensor Algebra Optimizations with Workspaces
This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language forExpand
Tensor Relational Algebra for Distributed Machine Learning System Design
TLDR
The TRA is a set-based algebra based on the relational algebra that is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Expand
A Sparse Tensor Benchmark Suite for CPUs and GPUs
TLDR
A set of reference tensor kernel implementations that are compatible with real-world tensors and power law tensors extended from synthetic graph generation techniques are presented and Roofline performance models for these kernels are proposed to provide insights of computer platforms from sparse tensor view. Expand
A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra
TLDR
This work shows that standard loop transformations, such as strip-mining, tiling, collapsing, parallelization and vectorization, can be applied to irregular loops over sparse iteration spaces and generates good code that is competitive with many hand-optimized implementations from the literature. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 99 REFERENCES
Sparse Tensor Algebra as a Parallel Programming Model
TLDR
This work extends the usual basic operations of tensor summation and contraction to arbitrary functions, and further operations such as reductions and mapping, and shows how key graph algorithms as well as common numerical kernels can be succinctly expressed using this interface and provide performance results of a general library implementation. Expand
Sparso: Context-driven optimizations of sparse linear algebra
TLDR
In Sparso, a compiler and sparse linear algebra libraries collaboratively discover and exploit context, which is defined as the invariant properties of matrices and relationships between them in a program, to discover the context and drive key optimizations across library routines and matrices. Expand
SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication
Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is aExpand
High-Performance Tensor Contraction without Transposition
TLDR
This work implements TC using the flexible BLAS-like Instantiation Software (BLIS) framework, which allows for transposition (reshaping) of the tensor to be fused with internal partitioning and packing operations, requiring no explicit transposition operations or additional workspace. Expand
Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures
TLDR
The optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms is presented, which is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. Expand
Efficient and scalable computations with sparse tensors
TLDR
This paper describes new sparse tensor storage formats that provide storage benefits and are flexible and efficient for performing tensor computations and proposes an optimization that improves data reuse and reduces redundant or unnecessary computations in tensor decomposition algorithms. Expand
Optimization of symmetric tensor computations
TLDR
Novel optimizations that exploit the symmetry in tensors in order to reduce redundancy in computations and storage and effectively parallelize operations involving symmetric tensors are described. Expand
Data-Parallel Language for Correct and Efficient Sparse Matrix Codes
TLDR
LL, a small functional language suitable for implementing operations on sparse matrices, is presented, and a compiler for LL programs that generates efficient, parallel C code is described, which facilitates a straightforward, syntax-directed translation of code. Expand
Tensor-matrix products with a compressed sparse tensor
TLDR
The compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensor-matrix multiplication is introduced and offers similar operation reductions as existing compressed methods while using only a single tensor structure. Expand
OSKI: A Library of Automatically Tuned Sparse Matrix Kernels
TLDR
An overview of OSKI is provided, which is based on research on automatically tuned sparse kernels for modern cache-based superscalar machines, and the primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine. Expand
...
1
2
3
4
5
...