Clustering Problems on Sliding Windows

@inproceedings{Braverman2016ClusteringPO,
  title={Clustering Problems on Sliding Windows},
  author={Vladimir Braverman and Harry Lang and Keith Levin and Morteza Monemizadeh},
  booktitle={SODA},
  year={2016}
}
We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)-approximation to the metric k-median and metric k-means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O'Callaghan [5], which has remained unanswered for over a decade. Our algorithm uses O(k3log6W) space and poly(k, log W) update time, where W is the window size. This… 
k-Center Clustering with Outliers in Sliding Windows
TLDR
This work provides efficient algorithms for metric k-center clustering in the streaming model under the sliding window setting and shows, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of thewindow points, disregarding a given fraction of noisy distances.
Flattened Exponential Histogram for Sliding Window Queries over Data Streams
TLDR
The flattened exponential histogram (FEH) model for the Basic Counting problem is presented and it is shown that with the same memory footprint, the accuracy of the model is between 4 to 15 and on average 7 times better than that of the exponential histograms, while the speed is roughly the same.
Diameter and k-Center in Sliding Windows
TLDR
This paper develops streaming algorithms for the diameter problem and the k-center clustering problem in the sliding window model and proves that any algorithm for the 2-center problem that achieves an approximation ratio of less than 4 requires Omega(N^{1/3}) space.
Submodular Optimization Over Sliding Windows
TLDR
This work provides the first non-trivial algorithm that maintains a provable approximation of the optimum using space sublinear in the size of the window, and matches the best known approximation guarantees for submodular optimization in insertion-only streams.
Improved Sliding Window Algorithms for Clustering and Coverage via Bucketing-Based Sketches
TLDR
This work proposes a new algorithmic framework for designing efficient sliding window algorithms via bucketing-based sketches and develops space-efficient slidingwindow algorithms for k-cover, k-clustering and diversity maximization problems.
Better Sliding Window Algorithms to Maximize Subadditive and Diversity Objectives
TLDR
This work describes an alternative approach to designing efficient sliding window algorithms for maximization problems, and instantiates this approach on a wide range of problems, yielding better algorithms for submodular function optimization, diversity optimization and general subadditive optimization.
Efficient Data Stream Clustering With Sliding Windows Based on Locality-Sensitive Hashing
TLDR
This paper improves data stream clustering over sliding windows using sliding window aggregation and nearest neighbor search techniques, and suggests a re-clustering policy that determines whether to append a new summary to pre-existing clusters or to perform clustering on the whole summary.
Sliding Window Algorithms for k-Clustering Problems
TLDR
This work provides simple and practical algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch, and finds solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.
Streaming Balanced Clustering
TLDR
This work develops Emph, the first single pass streaming algorithm for a general class of clustering problems that includes capacitated $k-median and capacitated £k-means in Euclidean space, using only poly( k d \log \Delta)$ space, where k is the number of clusters, d is the dimension and $\Delta$ is the maximum relative range of a coordinate.
Numerical Linear Algebra in the Sliding Window Model
TLDR
This work gives a deterministic algorithm that achieves spectral approximation in the sliding window model that can be viewed as a generalization of smooth histograms, using the Loewner ordering of PSD matrices, and gives algorithms for both spectral approximation and low-rank approximation that are space-optimal up to polylogarithmic factors.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
Effective Computations on Sliding Windows
TLDR
This paper presents a novel smooth histogram method that is more general and achieves stronger bounds than the exponential histogram, and provides the first approximation algorithms for the following functions: $L_p$ norms, frequency moments, the length of the increasing subsequence, and the geometric mean.
Better streaming algorithms for clustering problems
TLDR
A randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n) and gives bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.
Maintaining Stream Statistics over Sliding Windows
TLDR
The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$.
Streaming k-means on well-clusterable data
TLDR
A near-optimal streaming approximation algorithm for k-means in high-dimensional Euclidean space with sublinear memory and a single pass is shown, under the very natural assumption of data separability.
On coresets for k-means and k-median clustering
TLDR
This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median.
Maintaining variance and k-medians over data stream windows
TLDR
A novel technique is presented for solving two important and related problems in the sliding window model---maintaining variance and maintaining a <i>k</i>--median clustering and a constant-factor approximation algorithm is presented.
Element Distinctness, Frequency Moments, and Sliding Windows
TLDR
A randomized algorithm is developed for the element distinctness problem whose time T and space S satisfy T ∈ Õ (n3/2/S1/2), smaller than previous lower bounds for comparison-based algorithms, showing thatelement distinctness is strictly easier than sorting for randomized branching programs.
Dynamic Graphs in the Sliding-Window Model
TLDR
An extensive set of positive results including algorithms for constructing basic graph synopses like combinatorial sparsifiers and spanners as well as approximating classic graph properties such as the size of a graph matching or minimum spanning tree are presented.
Zero-One Laws for Sliding Windows and Universal Sketches
TLDR
It is shown that it is possible to collect universal statistics of polylogarithmic size, and it is proved that these universal statistics allow us after the fact to compute all other statistics that are computable with similar amounts of memory.
Clustering Data Streams
TLDR
This work gives constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass, and shows negative results implying that these algorithms cannot be improved in a certain sense.
...
...