Matrix Sketching Over Sliding Windows

  title={Matrix Sketching Over Sliding Windows},
  author={Zhewei Wei and Xuancheng Liu and Feifei Li and Shuo Shang and Xiaoyong Du and Ji-Rong Wen},
  journal={Proceedings of the 2016 International Conference on Management of Data},
Large-scale matrix computation becomes essential for many data data applications, and hence the problem of sketching matrix with small space and high precision has received extensive study for the past few years. This problem is often considered in the row-update streaming model, where the data set is a matrix A -- Rn x d, and the processor receives a row (1 x d) of A at each timestamp. The goal is to maintain a smaller matrix (termed approximation matrix, or simply approximation) B -- Rl x d… 

Figures and Tables from this paper

Tracking Matrix Approximation over Distributed Sliding Windows

This paper proposes sampling-based algorithms that continuously track a weighted sample of rows according to their squared norms, which generalize and simplify the sampling techniques in [2], and deterministic tracking algorithms that require only one-way communication and provide better error guarantee.

Efficient Matrix Sketching over Distributed Data

This paper considers the problem of computing a sketch of a massive data matrix A ∈ℜnxd, which is distributed across a large number of s servers and gives a new algorithm for distributed PCA with improved communication cost.

Near Optimal Linear Algebra in the Online and Sliding Window Models

A unified row-sampling based framework that gives randomized algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and $\ell_{1}$-subspace embeddings in the sliding window model, which often use nearly optimal space and achieve nearly input sparsity runtime.

Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order

A number of aspects of estimating matrix norms in a stream that have not previously been considered are considered, and a near-complete characterization of the memory required of row-order algorithms for estimating Schatten-norms of sparse matrices is obtained.

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices

New space-optimal algorithms with faster running times are provided and it is shown that the running times of these algorithms are near-Optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.

Communication-Efficient Distributed Covariance Sketch, with Application to Distributed PCA

This paper proves an almost tight deterministic communication lower bound, then provides a new randomized algorithm with communication cost smaller than the deterministic lower bound and gives an improved distributed PCA algorithm for sparse input matrices, which uses the distributed sketching algorithm as a key building block.

Truly Perfect Samplers for Data Streams and Sliding Windows

This work shows that sublinear space truly perfect sampling is impossible in the turnstile model, and proves a lower bound of Ω(min(n, log 1/γ) for any G-sampler with point-wise error γ from the true distribution, and gives a general time-efficient sublinear-space framework for developing truly perfect samplers in the insertion-only streaming and sliding window models.

Symmetric Norm Estimation and Regression on Sliding Windows

This work observes that the symmetric norm streaming algorithm of Braverman et al. (STOC 2017) can be reduced to identifying and approximating the frequency of heavy-hitters in a number of substreams, and introduces a heavy-hitter algorithm that gives a (1 + )-approximation to each of the reported frequencies in the sliding window model.

Sketches for Matrix Norms: Faster, Smaller and More General

It is proved that one can obtain an approximation to $l(A)$ from a sketch $GAH^T$ where $G$ and $H$ are independent Oblivious Subspace Embeddings and the dimension of the sketch is polynomial in the intrinsic dimension of $A$.

Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators

The results show there is no separation between the sliding window model and the standard data stream model in terms of the approximation factor, and the first difference estimators for a wide range of problems are developed.



Simple and deterministic matrix sketching

This paper adapts a well known streaming algorithm for approximating item frequencies to the matrix sketching setting and presents a streaming algorithm whose error decays proportional to 1/l using O(ml) space.

Continuous Matrix Approximation on Distributed Data

Novel algorithms to address the matrix approximation problem of "tracking approximations to a matrix" in the distributed streaming model are presented and extensive experiments with real large datasets demonstrate the efficiency of these protocols.

Sketching distributed sliding-window data streams

This work introduces a novel sketching technique (termed ECM-sketch) that allows effective summarization of streaming data over both time-based and count-based sliding windows with probabilistic accuracy guarantees and is the first work to address efficient, guaranteed-error complex query answering over distributed data streams in the sliding-window model.

Sampling time-based sliding windows in bounded space

This paper focuses on sampling schemes that sample from a sliding window over a recent time interval; such windows are a popular and highly comprehensible method to model recency and it is proved that it is impossible to guarantee a minimum sample size in bounded space.

Sampling from a moving window over streaming data

This work introduces the problem of sampling from a moving window of recent items from a data stream and develops two algorithms, the first of which, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample and the second, "priority- sample", works even when the number of elements in the window can vary dynamically over time.

Maintaining Stream Statistics over Sliding Windows

The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$.

Maintaining sliding window skylines on data streams

This paper proposes algorithms that continuously monitor the incoming data and maintain the skyline incrementally, and utilizes several interesting properties of stream skylines to improve space/time efficiency by expunging data from the system as early as possible (i.e., before their expiration).

Continuous sampling from distributed streams

This article presents communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams, and shows that these protocols are optimal (up to logarithmic factors), not just in terms of the communication used, but also the time and space costs for each participant.

Improved Practical Matrix Sketching with Guarantees

This paper attempts to categorize and compare the most known methods under row-wise streaming updates with provable guarantees, and then to tweak some of these methods to gain practical improvements while retaining guarantees.

Relative Errors for Deterministic Low-Rank Matrix Approximations

It is shown that Frequent Directions cannot be adapted to a sparse version in an obvious way that retains the l original rows of the matrix, as opposed to a linear combination or sketch of the rows.