Tall and skinny QR factorizations in MapReduce architectures

@inproceedings{Constantine2011TallAS,
  title={Tall and skinny QR factorizations in MapReduce architectures},
  author={Paul G. Constantine and David F. Gleich},
  booktitle={MapReduce '11},
  year={2011}
}
The QR factorization is one of the most important and useful matrix factorizations in scientific computing. A recent communication-avoiding version of the QR factorization trades flops for messages and is ideal for MapReduce, where computationally intensive processes operate locally on subsets of the data. We present an implementation of the tall and skinny QR (TSQR) factorization in the MapReduce framework, and we provide computational results for nearly terabyte-sized datasets. These tasks… 

Figures and Tables from this paper

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures
TLDR
This paper describes how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data, and finds that the new stable method is competitive with unstable methods for matrices with a modest number of columns.
Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices
TLDR
This paper shows how to make these algorithms scalable for data matrices that have many more rows than columns, so-called "tall-and-skinny matrices", and demonstrates the efficacy of these algorithms on terabyte-sized matrices from scientific computing and bioinformatics.
Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection
TLDR
This work presents a realistic performance model and indicates the possibility that TSQR becomes slower than Householder QR as the number of columns of the target matrix increases, and aims for estimating the difference and selecting the faster algorithm by using models, which falls into auto-tuning.
A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data
  • D. Schmidt
  • Computer Science
    2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)
  • 2020
TLDR
A survey of three different algorithms for computing the Singular Value Decomposition for these kinds of tall/skinny data layouts using MPI for communication is presented and contextualize these with common big data analytics techniques.
QR Decomposition in a Multicore Environment
In this study we examine performance benefits of implementing the QR decomposition in a way that takes advantage of multiple processes or threads. This is done by partitioning the matrix into blocks
Exploring Dual-Triangular Structure for Efficient R-Initiated Tall-Skinny QR on GPGPU
TLDR
A novel R-initiated TSQR is proposed to make the computing of tall-and-skinny QR on the GPGPU efficient and can not only meet the memory limitation of GPG PU but also avoid large amounts of data transmission.
Singular Value Decomposition on Spark
The Singular Value Decomposition of a matrix is one of the most fundamental matrix factorizations in scientific computing. It is used in a variety of applications in machine learning and data mining,
Acceleration of Parallel-Blocked QR Decomposition of Tall-and-Skinny Matrices on FPGAs
TLDR
This work proposes a high-throughput FPGA-based engine that has a very high computational efficiency (ratio of achieved to peak throughput) compared to similar QR solvers running on FPGAs.
Performance Optimization of the SSVD Collaborative Filtering Algorithm on MapReduce Architectures
  • ShiouCheng Yu, Quey-Liang Kao, Che-Rung Lee
  • Computer Science
    2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
  • 2016
TLDR
Experiments showed although the MapReduce architecture is different from traditional high performance computing environments, those techniques can be still very effective and near eight times speedup can be achieved for the most time consuming job, and more than five times speed up can be obtained for the entire program for large datasets.
Scalable Massively Parallel Learning of Multiple Linear Regression Algorithm with MapReduce
TLDR
This paper introduces a new distributed training method, which combines the widely used framework, MapReduce, for Multiple Linear Regression which will be based on the QR decomposition and the ordinary least squares method adapted to Map Reduce.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce
TLDR
This paper leverages the MPI library capabilities by using user-defined MPI operations and MPI_Reduce to perform a QR factorization of a tall and skinny matrix with n columns as a reduction.
QR factorization of tall and skinny matrices in a grid computing environment
TLDR
A recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) is articulate in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites.
Communication-avoiding parallel and sequential QR factorizations
TLDR
Both parallel and sequential performance results show that TSQR outperforms competing methods, and CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout, removes a latency bottleneck in ScaLAPACK's current parallel approach.
MapReduce: Simplified Data Processing on Large Clusters
TLDR
This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
TSQR on EC 2 Using the Nexus Substrate
TLDR
This project makes use of its native APIs and decided against MPI in large part due to others having already implemented TSQR using MPI, though I was unable to obtain a fully functioning implementation for comparison.
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance
TLDR
The content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers, are outlined and alternative approaches to mathematical libraries are suggested, explaining how Sca LAPACK could be integrated into efficient and user-friendly distributed systems.
A Survey of Parallel Algorithms in Numerical Linear Algebra.
TLDR
A comprehensive survey of parallel techniques for problems in linear algebra is given, specific topics include: relevant computer models and their consequences for programs, evaluation of arithmetic expressions, solution of general and special linear systems of equations, and computation of eigenvalues.
Automatically Tuned Linear Algebra Software
TLDR
An approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units using the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS).
A Block Orthogonalization Procedure with Constant Synchronization Requirements
TLDR
An alternative orthonormalization method that computes the orthonor- mal basis from the right singular vectors of a matrix from Gram-Schmidt, Householder and a phase of the new method is proposed.
Faster least squares approximation
TLDR
This work presents two randomized algorithms that provide accurate relative-error approximations to the optimal value and the solution vector of a least squares approximation problem more rapidly than existing exact algorithms.
...
...