# Parallel algorithms for computing the tensor-train decomposition

@article{Shi2021ParallelAF, title={Parallel algorithms for computing the tensor-train decomposition}, author={Tianyi Shi and Max E. Ruth and Alex Townsend}, journal={ArXiv}, year={2021}, volume={abs/2111.10448} }

The tensor-train (TT) decomposition expresses a tensor in a data-sparse format used in molecular simulations, high-order correlation functions, and optimization. In this paper, we propose four parallelizable algorithms that compute the TT format from various tensor inputs: (1) Parallel-TTSVD for traditional format, (2) PSTT and its variants for streaming data, (3) Tucker2TT for Tucker format, and (4) TT-fADI for solutions of Sylvester tensor equations. We provide theoretical guarantees of…

## 5 Citations

### Streaming Tensor Train Approximation

- Computer ScienceArXiv
- 2022

This work introduces the Streaming Tensor Train Approximation (STTA), a new class of algorithms for approximating a given tensor T in the tensor train format that streams exclusively via two-sided random sketches of the original data, making it streamable and easy to implement in parallel.

### Generative modeling via tensor train sketching

- Computer ScienceSSRN Electronic Journal
- 2022

A sketching algorithm for constructing a tensor train representation of a probability density from its samples is introduced and it is proved that the tensor cores can be recovered with a sample complexity that is constant with respect to the dimension.

### High-dimensional density estimation with tensorizing flow

- Computer ScienceArXiv
- 2022

The proposed tensorizing method combines the optimization-less feature of the tensor-train with theexibility of the ﬂow-based generative models to estimate high-dimensional probability density functions from the observed data.

### Implicit step-truncation integration of nonlinear PDEs on low-rank tensor manifolds

- Computer ScienceArXiv
- 2022

A new class of implicit rank-adaptive algorithms for temporal integration of nonlinear evolution equations on tensor manifolds based on performing one time step with a conventional time-stepping scheme, followed by an implicit point iteration step involving a rank- Adaptive truncation operation onto a tensor manifold.

### Tensor rank reduction via coordinate flows

- Computer ScienceArXiv
- 2022

The effectiveness of the proposed new tensor rank reduction method on prototype function approximation problems, and in computing the numerical solution of the Liouville equation in dimensions three and six are demonstrated.

## References

SHOWING 1-10 OF 49 REFERENCES

### Parallel Tensor Train through Hierarchical Decomposition

- Computer Science
- 2020

It is proved that the ranks of TT-representation produced by the algorithm are bounded by the ranksof unfolding matrices of the tensor, and it is shown that the approach which transmits leading singular values to both of its children performs better in practice.

### Parallel Algorithms for Tensor Train Arithmetic

- Computer ScienceSIAM J. Sci. Comput.
- 2022

This work considers algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation) that are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure.

### High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors

- Computer Science2016 45th International Conference on Parallel Processing (ICPP)
- 2016

A set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps are discussed.

### Parallel Algorithms for Low Rank Tensor Arithmetic

- Computer ScienceAdvances in Mechanics and Mathematics
- 2019

Seven parallel algorithms which perform arithmetic operations on tensors in the HT format, where they assume the tensor data to be distributed over several compute nodes, to perform post-processing on solution tensors.

### Model-Driven Sparse CP Decomposition for Higher-Order Tensors

- Computer Science2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2017

A novel, adaptive tensor memoization algorithm, AdaTM, which allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework, making its performance more scalable for higher-order data problems.

### MERACLE: Constructive layer-wise conversion of a Tensor Train into a MERA

- Computer ScienceCommunications on Applied Mathematics and Computation
- 2020

Two new algorithms are presented that convert a given data tensor train into either a Tucker decomposition with orthogonal matrix factors or a multi-scale entanglement renormalization ansatz (MERA), resulting in both computationally and storage efficient algorithms.

### SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

- Computer Science2015 IEEE International Parallel and Distributed Processing Symposium
- 2015

Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is a…

### Low-Rank Tucker Approximation of a Tensor From Streaming Data

- Computer ScienceSIAM J. Math. Data Sci.
- 2020

A new algorithm for computing a low-Tucker-rank approximation of a tensor that applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode, as well as the interactions among the modes.

### Fast and Accurate Randomized Algorithms for Low-rank Tensor Decompositions

- Computer ScienceNeurIPS
- 2021

A fast and accurate sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least squares subproblems, and which not only converges faster, but also yields more accurate CP decompositions.

### Parallel Tensor Compression for Large-Scale Scientific Data

- Computer Science2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2016

This work presents the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts.