# Simple and deterministic matrix sketching

@article{Liberty2013SimpleAD, title={Simple and deterministic matrix sketching}, author={Edo Liberty}, journal={Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining}, year={2013} }

A sketch of a matrix A is another matrix B which is significantly smaller than A but still approximates it well. Finding such sketches efficiently is an important building block in modern algorithms for approximating, for example, the PCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can only be processed once and storage is severely limited. In this paper we adapt a well known streaming algorithm for approximating item…

## 267 Citations

Efficient Matrix Sketching over Distributed Data

- Computer SciencePODS
- 2017

This paper considers the problem of computing a sketch of a massive data matrix A ∈ℜnxd, which is distributed across a large number of s servers and gives a new algorithm for distributed PCA with improved communication cost.

Randomization or Condensation?: Linear-Cost Matrix Sketching Via Cascaded Compression Sampling

- Computer ScienceKDD
- 2017

An interesting theoretic connection between matrix low-rank decomposition and lossy signal compression is uncovered, based on which a cascaded compression sampling framework is devised to approximate an m-by-n matrix in only O(m+n) time and space.

Seeing the Forest from the Trees in Two Looks: Matrix Sketching by Cascaded Bilateral Sampling

- Computer ScienceArXiv
- 2016

The cascaded bilateral sampling (CABS) framework is proposed, and the rise of approximation quality is shown to be lower-bounded by the improvement of encoding powers in the follow-up sampling step, thus theoretically guarantees the algorithmic boosting property.

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices

- Computer ScienceICML
- 2018

New space-optimal algorithms with faster running times are provided and it is shown that the running times of these algorithms are near-Optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.

Communication-Efficient Distributed Covariance Sketch, with Application to Distributed PCA

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This paper proves an almost tight deterministic communication lower bound, then provides a new randomized algorithm with communication cost smaller than the deterministic lower bound and gives an improved distributed PCA algorithm for sparse input matrices, which uses the distributed sketching algorithm as a key building block.

Low Rank Approximation Lower Bounds in Row-Update Streams

- Computer ScienceNIPS
- 2014

The lower bound of Ω(dk/∊) bits of space is given, almost matching the upper bound of Ghashami and Phillips up to the word size, improving on a simple Ω (dk) space lower bound.

Matrix Sketching Over Sliding Windows

- Computer ScienceSIGMOD Conference
- 2016

It is shown that maintaining ATA exactly requires linear space in the sliding window model, as opposed to O(d2) space inThe streaming model, and three general frameworks for matrix sketching on sliding windows are presented.

Frequent Directions for Matrix Sketching with Provable Bounds: A Generalized Approach

- Computer Science
- 2020

A Generalized Frequent Directions (GFD) algorithm for matrix sketching is proposed, which captures all the previous FD algorithms as special cases without losing any of the theoretical bounds.

SKETCHING DISCRETE VALUED SPARSE MATRICES

- Computer Science2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)
- 2018

This paper proposes two novel algorithms that can efficiently recover a discrete valued sparse matrix from its sketch and presents a low-complexity message passing based recovery algorithm which exploits any sparsity structure that is present in X.

More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data

- Computer ScienceKDD
- 2015

This work provides the first linear time approximation scheme (LTAS) for the rank-one NMF, and proves that applying existing algorithms on the resulting coreset can be turned into (1+ε)-approximations for the original (large) input matrix.

## References

SHOWING 1-10 OF 38 REFERENCES

A sparse Johnson: Lindenstrauss transform

- Computer ScienceSTOC '10
- 2010

A sparse version of the fundamental tool in dimension reduction -- the Johnson-Lindenstrauss transform is obtained, using hashing and local densification to construct a sparse projection matrix with just ~O(1/ε) non-zero entries per column, and a matching lower bound on the sparsity for a large class of projection matrices is shown.

Numerical linear algebra in the streaming model

- Computer ScienceSTOC '09
- 2009

Near-optimal space bounds are given in the streaming model for linear algebra problems that include estimation of matrix products, linear regression, low-rank approximation, and approximation of matrix rank; results for turnstile updates are proved.

Low-Rank Approximation and Regression in Input Sparsity Time

- Computer ScienceArXiv
- 2012

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥2 = (1 ± ε)∥Ax∥2 simultaneously for all x ∈ Rd. Here, m is…

Improved Approximation Algorithms for Large Matrices via Random Projections

- Computer Science2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

The key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation.

Relative-Error CUR Matrix Decompositions

- Computer Science, MathematicsSIAM J. Matrix Anal. Appl.
- 2008

These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.

Adaptive Sampling and Fast Low-Rank Matrix Approximation

- Computer Science, MathematicsAPPROX-RANDOM
- 2006

We prove that any real matrix A contains a subset of at most 4k/e+ 2k log(k+1) rows whose span “contains” a matrix of rank at most k with error only (1+e) times the error of the best rank-k…

Fast monte-carlo algorithms for finding low-rank approximations

- Computer ScienceJACM
- 2004

An algorithm is developed that is qualitatively faster, provided the authors may sample the entries of the matrix in accordance with a natural probability distribution, and implies that in constant time, it can be determined if a given matrix of arbitrary size has a good low-rank approximation.

A Fast Random Sampling Algorithm for Sparsifying Matrices

- Computer ScienceAPPROX-RANDOM
- 2006

A simple random-sampling based procedure for producing sparse matrix approximations that computes the sparse matrix approximation in a single pass over the data, leading to much savings in space.

An improved approximation algorithm for the column subset selection problem

- Mathematics, Computer ScienceSODA
- 2009

A novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A, and it is proved that the spectral norm bound improves upon the best previously-existing result and is roughly O(√k!) better than the best previous algorithmic result.

Fast approximation of matrix coherence and statistical leverage

- Computer ScienceICML
- 2012

A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores.