• Corpus ID: 244478448

Parallel algorithms for computing the tensor-train decomposition

  title={Parallel algorithms for computing the tensor-train decomposition},
  author={Tianyi Shi and Max E. Ruth and Alex Townsend},
The tensor-train (TT) decomposition expresses a tensor in a data-sparse format used in molecular simulations, high-order correlation functions, and optimization. In this paper, we propose four parallelizable algorithms that compute the TT format from various tensor inputs: (1) Parallel-TTSVD for traditional format, (2) PSTT and its variants for streaming data, (3) Tucker2TT for Tucker format, and (4) TT-fADI for solutions of Sylvester tensor equations. We provide theoretical guarantees of… 

Figures and Tables from this paper

Streaming Tensor Train Approximation

This work introduces the Streaming Tensor Train Approximation (STTA), a new class of algorithms for approximating a given tensor T in the tensor train format that streams exclusively via two-sided random sketches of the original data, making it streamable and easy to implement in parallel.

Generative modeling via tensor train sketching

A sketching algorithm for constructing a tensor train representation of a probability density from its samples is introduced and it is proved that the tensor cores can be recovered with a sample complexity that is constant with respect to the dimension.

High-dimensional density estimation with tensorizing flow

The proposed tensorizing method combines the optimization-less feature of the tensor-train with theexibility of the flow-based generative models to estimate high-dimensional probability density functions from the observed data.

Implicit step-truncation integration of nonlinear PDEs on low-rank tensor manifolds

A new class of implicit rank-adaptive algorithms for temporal integration of nonlinear evolution equations on tensor manifolds based on performing one time step with a conventional time-stepping scheme, followed by an implicit point iteration step involving a rank- Adaptive truncation operation onto a tensor manifold.

Tensor rank reduction via coordinate flows

The effectiveness of the proposed new tensor rank reduction method on prototype function approximation problems, and in computing the numerical solution of the Liouville equation in dimensions three and six are demonstrated.



Parallel Tensor Train through Hierarchical Decomposition

It is proved that the ranks of TT-representation produced by the algorithm are bounded by the ranksof unfolding matrices of the tensor, and it is shown that the approach which transmits leading singular values to both of its children performs better in practice.

Parallel Algorithms for Tensor Train Arithmetic

This work considers algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation) that are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure.

High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors

  • O. KayaB. Uçar
  • Computer Science
    2016 45th International Conference on Parallel Processing (ICPP)
  • 2016
A set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps are discussed.

Parallel Algorithms for Low Rank Tensor Arithmetic

Seven parallel algorithms which perform arithmetic operations on tensors in the HT format, where they assume the tensor data to be distributed over several compute nodes, to perform post-processing on solution tensors.

Model-Driven Sparse CP Decomposition for Higher-Order Tensors

A novel, adaptive tensor memoization algorithm, AdaTM, which allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework, making its performance more scalable for higher-order data problems.

MERACLE: Constructive layer-wise conversion of a Tensor Train into a MERA

Two new algorithms are presented that convert a given data tensor train into either a Tucker decomposition with orthogonal matrix factors or a multi-scale entanglement renormalization ansatz (MERA), resulting in both computationally and storage efficient algorithms.

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is a

Low-Rank Tucker Approximation of a Tensor From Streaming Data

A new algorithm for computing a low-Tucker-rank approximation of a tensor that applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode, as well as the interactions among the modes.

Fast and Accurate Randomized Algorithms for Low-rank Tensor Decompositions

A fast and accurate sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least squares subproblems, and which not only converges faster, but also yields more accurate CP decompositions.

Parallel Tensor Compression for Large-Scale Scientific Data

This work presents the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts.