# Matrix Sketching Over Sliding Windows

@article{Wei2016MatrixSO, title={Matrix Sketching Over Sliding Windows}, author={Zhewei Wei and Xuancheng Liu and Feifei Li and Shuo Shang and Xiaoyong Du and Ji-Rong Wen}, journal={Proceedings of the 2016 International Conference on Management of Data}, year={2016} }

Large-scale matrix computation becomes essential for many data data applications, and hence the problem of sketching matrix with small space and high precision has received extensive study for the past few years. This problem is often considered in the row-update streaming model, where the data set is a matrix A -- Rn x d, and the processor receives a row (1 x d) of A at each timestamp. The goal is to maintain a smaller matrix (termed approximation matrix, or simply approximation) B -- Rl x d…

## 22 Citations

Tracking Matrix Approximation over Distributed Sliding Windows

- Computer Science2017 IEEE 33rd International Conference on Data Engineering (ICDE)
- 2017

This paper proposes sampling-based algorithms that continuously track a weighted sample of rows according to their squared norms, which generalize and simplify the sampling techniques in [2], and deterministic tracking algorithms that require only one-way communication and provide better error guarantee.

Efficient Matrix Sketching over Distributed Data

- Computer SciencePODS
- 2017

This paper considers the problem of computing a sketch of a massive data matrix A ∈ℜnxd, which is distributed across a large number of s servers and gives a new algorithm for distributed PCA with improved communication cost.

Near Optimal Linear Algebra in the Online and Sliding Window Models

- Computer Science2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)
- 2020

A unified row-sampling based framework that gives randomized algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and $\ell_{1}$-subspace embeddings in the sliding window model, which often use nearly optimal space and achieve nearly input sparsity runtime.

Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order

- Computer ScienceICML
- 2018

A number of aspects of estimating matrix norms in a stream that have not previously been considered are considered, and a near-complete characterization of the memory required of row-order algorithms for estimating Schatten-norms of sparse matrices is obtained.

Smoothness of Schatten Norms and Sliding-Window Matrix Streams

- Computer ScienceInf. Process. Lett.
- 2022

Communication-Efficient Distributed Covariance Sketch, with Application to Distributed PCA

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This paper proves an almost tight deterministic communication lower bound, then provides a new randomized algorithm with communication cost smaller than the deterministic lower bound and gives an improved distributed PCA algorithm for sparse input matrices, which uses the distributed sketching algorithm as a key building block.

Truly Perfect Samplers for Data Streams and Sliding Windows

- Computer Science, MathematicsPODS
- 2022

This work shows that sublinear space truly perfect sampling is impossible in the turnstile model, and proves a lower bound of Ω(min(n, log 1/γ) for any G-sampler with point-wise error γ from the true distribution, and gives a general time-efficient sublinear-space framework for developing truly perfect samplers in the insertion-only streaming and sliding window models.

Symmetric Norm Estimation and Regression on Sliding Windows

- Computer Science, MathematicsCOCOON
- 2021

This work observes that the symmetric norm streaming algorithm of Braverman et al. (STOC 2017) can be reduced to identifying and approximating the frequency of heavy-hitters in a number of substreams, and introduces a heavy-hitter algorithm that gives a (1 + )-approximation to each of the reported frequencies in the sliding window model.

Sketches for Matrix Norms: Faster, Smaller and More General

- Computer ScienceArXiv
- 2016

It is proved that one can obtain an approximation to $l(A)$ from a sketch $GAH^T$ where $G$ and $H$ are independent Oblivious Subspace Embeddings and the dimension of the sketch is polynomial in the intrinsic dimension of $A$.

Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators

- Computer Science2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)
- 2022

The results show there is no separation between the sliding window model and the standard data stream model in terms of the approximation factor, and the first difference estimators for a wide range of problems are developed.

## References

SHOWING 1-10 OF 44 REFERENCES

Continuous Matrix Approximation on Distributed Data

- Computer ScienceProc. VLDB Endow.
- 2014

Novel algorithms to address the matrix approximation problem of "tracking approximations to a matrix" in the distributed streaming model are presented and extensive experiments with real large datasets demonstrate the efficiency of these protocols.

Sketching distributed sliding-window data streams

- Computer ScienceThe VLDB Journal
- 2015

This work introduces a novel sketching technique (termed ECM-sketch) that allows effective summarization of streaming data over both time-based and count-based sliding windows with probabilistic accuracy guarantees and is the first work to address efficient, guaranteed-error complex query answering over distributed data streams in the sliding-window model.

Sampling time-based sliding windows in bounded space

- Computer ScienceSIGMOD Conference
- 2008

This paper focuses on sampling schemes that sample from a sliding window over a recent time interval; such windows are a popular and highly comprehensible method to model recency and it is proved that it is impossible to guarantee a minimum sample size in bounded space.

Sampling from a moving window over streaming data

- Computer ScienceSODA '02
- 2002

This work introduces the problem of sampling from a moving window of recent items from a data stream and develops two algorithms, the first of which, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample and the second, "priority- sample", works even when the number of elements in the window can vary dynamically over time.

Maintaining Stream Statistics over Sliding Windows

- Computer Science, MathematicsSIAM J. Comput.
- 2002

The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$.

Maintaining sliding window skylines on data streams

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2006

This paper proposes algorithms that continuously monitor the incoming data and maintain the skyline incrementally, and utilizes several interesting properties of stream skylines to improve space/time efficiency by expunging data from the system as early as possible (i.e., before their expiration).

Continuous sampling from distributed streams

- Computer ScienceJACM
- 2012

This article presents communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams, and shows that these protocols are optimal (up to logarithmic factors), not just in terms of the communication used, but also the time and space costs for each participant.

Improved Practical Matrix Sketching with Guarantees

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2016

This paper attempts to categorize and compare the most known methods under row-wise streaming updates with provable guarantees, and then to tweak some of these methods to gain practical improvements while retaining guarantees.

Relative Errors for Deterministic Low-Rank Matrix Approximations

- MathematicsSODA
- 2014

It is shown that Frequent Directions cannot be adapted to a sparse version in an obvious way that retains the l original rows of the matrix, as opposed to a linear combination or sketch of the rows.

Improved Approximation Algorithms for Large Matrices via Random Projections

- Computer Science2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

The key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation.