• Corpus ID: 202540977

PLANC: Parallel Low Rank Approximation with Non-negativity Constraints

@article{Eswar2021PLANCPL,
  title={PLANC: Parallel Low Rank Approximation with Non-negativity Constraints},
  author={Srinivas Eswar and Koby Hayashi and Grey Ballard and Ramakrishnan Kannan and Michael A. Matheson and Haesun Park},
  journal={ACM Trans. Math. Softw.},
  year={2021},
  volume={47},
  pages={20:1-20:37}
}
We consider the problem of low-rank approximation of massive dense non-negative tensor data, for example to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes and performing efficient and scalable parallel algorithms… 

Distributed Out-of-Memory NMF of Dense and Sparse Data on CPU/GPU Architectures with Automatic Model Selection for Exascale Data

A new distributed out-of-core NMF method, named pyDNMF-GPU, designed for modern heterogeneous CPU/GPU architectures that is capable of factoring exascale-sized dense and sparse matrices and integrates with an automatic model selection method.

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

This paper introduces the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every $(N-1)/N$ sweeps, and introduces a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation.

Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions

This article illustrates how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity, and becomes possible to make efficient use of GPUs for further speedups.

Sparsity-Aware Tensor Decomposition

This paper considers a design space that covers whether the partial MTTKRP results should be saved, different mode permutations and model the total volume of data movement to/from memory, and proposes a fine-grained load balancing method that supports higher levels of parallelization.

Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization

A parallel algorithm for hierarchical clustering that uses a divide-and-conquer approach based on rank-two NMF to split a data set into two cohesive parts, finding more structure in the data than a flat NMF clustering.

CP Decomposition for Tensors via Alternating Least Squares with QR Decomposition

This paper develops versions of the CP-ALS algorithm using the QR decomposition and the singular value decomposition, which are more numerically stable than the normal equations, to solve the linear least squares problems.

PARALLEL ALGORITHMS FOR LOW-RANK APPROXIMATIONS OF MATRICES AND TENSORS BY LAWTON MANNING

  • Computer Science
  • 2021
A chronology of key events leading up to and including the 9/11 attacks is provided.

References

SHOWING 1-10 OF 68 REFERENCES

ParCube: Sparse Parallelizable Tensor Decompositions

ParCube is proposed, a new and highly parallelizable method for speeding up tensor decompositions that is well-suited to producing sparse approximations and is the first to analyze the very large Nell dataset using a sparse Tensor decomposition, demonstrating that ParCube enables us to handle effectively and efficiently very large datasets.

High Performance Parallel Algorithms for Tensor Decompositions

The main focus of this thesis is on efficient decomposition of high dimensional sparse tensors, with hundreds of millions to billions of nonzero entries, which arise in many emerging big data applications and introduces a tree-based computational scheme that carries out expensive operations faster by factoring out and storing common partial results and effectivelyre-using them.

A high-performance parallel algorithm for nonnegative matrix factorization

A high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H, which maintains the data and factor matrices in memory, uses MPI for interprocessor communication, and provably minimizes communication costs.

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

Multi-dimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is a

Efficient and scalable computations with sparse tensors

This paper describes new sparse tensor storage formats that provide storage benefits and are flexible and efficient for performing tensor computations and proposes an optimization that improves data reuse and reduces redundant or unnecessary computations in tensor decomposition algorithms.

High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors

  • O. KayaB. Uçar
  • Computer Science
    2016 45th International Conference on Parallel Processing (ICPP)
  • 2016
A set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps are discussed.

Parallel Candecomp/Parafac Decomposition of Sparse Tensors Using Dimension Trees

A novel computational scheme for reducing the cost of a core operation in computing the CP decomposition with the traditional alternating least squares (CP-ALS) based algorithm is proposed and effectively parallelize this computational scheme in the context of CP-ALS in shared and distributed memory environments.

PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

A parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality is devised, demonstrating significant performance improvement over existing state-of-the-art parallelNMF algorithms.

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization

A medium-grained decomposition of the tensor nonzeros is presented that avoids complete factor replication and communication, while eliminating the need for expensive pre-processing steps and uses a hybrid MPI+OpenMP implementation that exploits multi-core architectures with a low memory footprint.

Model-Driven Sparse CP Decomposition for Higher-Order Tensors

A novel, adaptive tensor memoization algorithm, AdaTM, which allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework, making its performance more scalable for higher-order data problems.
...