# Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions

@article{Psarras2020ConcurrentAL, title={Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions}, author={C. Psarras and L. Karlsson and P. Bientinesi}, journal={ArXiv}, year={2020}, volume={abs/2010.04678} }

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice… Expand

#### One Citation

The landscape of software for tensor computations

- Computer Science
- ArXiv
- 2021

The aim is to assemble a comprehensive and up-to-date snapshot of the tensor software landscape, with the intention of helping both users and developers. Expand

#### References

SHOWING 1-10 OF 57 REFERENCES

A scalable optimization approach for fitting canonical tensor decompositions

- Mathematics
- 2011

Tensor decompositions are higher‐order analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor… Expand

Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations

- Computer Science, Mathematics
- IEEE Transactions on Signal Processing
- 2013

The proposed method is more efficient than the state-of-the-art ALS algorithm which operates two modes at a time (ALSo2) in the Eigenvector PLS toolbox, especially for tensors with order N ≥ 5 and high rank. Expand

ParCube: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition

- Computer Science
- ACM Trans. Knowl. Discov. Data
- 2015

This work is the first to analyze the very large N ell dataset using a sparse tensor decomposition, demonstrating that P ar C ube enables us to handle effectively and efficiently very large datasets. Expand

Model-Driven Sparse CP Decomposition for Higher-Order Tensors

- Computer Science
- 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2017

A novel, adaptive tensor memoization algorithm, AdaTM, which allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework, making its performance more scalable for higher-order data problems. Expand

Accelerating Alternating Least Squares for Tensor Decomposition by Pairwise Perturbation

- Mathematics, Computer Science
- ArXiv
- 2018

This work introduces a novel family of algorithms that uses perturbative corrections to the subproblems rather than recomputing the tensor contractions, and shows improvements of up to 2.5X with respect to state of the art alternating least squares approaches for various model tensor problems and real datasets. Expand

A Randomized Block Sampling Approach to Canonical Polyadic Decomposition of Large-Scale Tensors

- Mathematics, Computer Science
- IEEE Journal of Selected Topics in Signal Processing
- 2016

The randomized block sampling canonical polyadic decomposition method presented here combines increasingly popular ideas from randomization and stochastic optimization to tackle the computational problems of large-scale tensors. Expand

PLANC: Parallel Low Rank Approximation with Non-negativity Constraints

- Computer Science, Mathematics
- ACM Trans. Math. Softw.
- 2021

This work proposes a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes and performing efficient and scalable parallel algorithms to compute the low-rank approximation. Expand

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

- Computer Science
- 2015 IEEE International Parallel and Distributed Processing Symposium
- 2015

Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is a… Expand

Extrapolated Alternating Algorithms for Approximate Canonical Polyadic Decomposition

- Computer Science
- ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020

This work proposes several algorithms based on extrapolation that improve over existing alternating methods for aCPD and shows that carefully designed extrapolation can significantly improve the convergence speed hence reduce the computational time, especially in difficult scenarios. Expand

Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking

- Mathematics, Computer Science
- SIAM J. Sci. Comput.
- 2015

A blockwise computation of the CP gradient is considered, reducing the memory requirements to a constant and a heuristic algorithm for automatically choosing the division into subtensors is part of the proposed algorithm. Expand