Corpus ID: 222271948

Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions

  title={Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions},
  author={C. Psarras and L. Karlsson and P. Bientinesi},
Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice… Expand
1 Citations

Figures and Tables from this paper

The landscape of software for tensor computations
The aim is to assemble a comprehensive and up-to-date snapshot of the tensor software landscape, with the intention of helping both users and developers. Expand


A scalable optimization approach for fitting canonical tensor decompositions
Tensor decompositions are higher‐order analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensorExpand
Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations
The proposed method is more efficient than the state-of-the-art ALS algorithm which operates two modes at a time (ALSo2) in the Eigenvector PLS toolbox, especially for tensors with order N ≥ 5 and high rank. Expand
ParCube: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition
This work is the first to analyze the very large N ell dataset using a sparse tensor decomposition, demonstrating that P ar C ube enables us to handle effectively and efficiently very large datasets. Expand
Model-Driven Sparse CP Decomposition for Higher-Order Tensors
A novel, adaptive tensor memoization algorithm, AdaTM, which allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework, making its performance more scalable for higher-order data problems. Expand
Accelerating Alternating Least Squares for Tensor Decomposition by Pairwise Perturbation
This work introduces a novel family of algorithms that uses perturbative corrections to the subproblems rather than recomputing the tensor contractions, and shows improvements of up to 2.5X with respect to state of the art alternating least squares approaches for various model tensor problems and real datasets. Expand
A Randomized Block Sampling Approach to Canonical Polyadic Decomposition of Large-Scale Tensors
The randomized block sampling canonical polyadic decomposition method presented here combines increasingly popular ideas from randomization and stochastic optimization to tackle the computational problems of large-scale tensors. Expand
PLANC: Parallel Low Rank Approximation with Non-negativity Constraints
This work proposes a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes and performing efficient and scalable parallel algorithms to compute the low-rank approximation. Expand
SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication
Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is aExpand
Extrapolated Alternating Algorithms for Approximate Canonical Polyadic Decomposition
This work proposes several algorithms based on extrapolation that improve over existing alternating methods for aCPD and shows that carefully designed extrapolation can significantly improve the convergence speed hence reduce the computational time, especially in difficult scenarios. Expand
Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking
A blockwise computation of the CP gradient is considered, reducing the memory requirements to a constant and a heuristic algorithm for automatically choosing the division into subtensors is part of the proposed algorithm. Expand