# Clustering Problems on Sliding Windows

@inproceedings{Braverman2016ClusteringPO, title={Clustering Problems on Sliding Windows}, author={Vladimir Braverman and Harry Lang and Keith Levin and Morteza Monemizadeh}, booktitle={SODA}, year={2016} }

We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)-approximation to the metric k-median and metric k-means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O'Callaghan [5], which has remained unanswered for over a decade. Our algorithm uses O(k3log6W) space and poly(k, log W) update time, where W is the window size. This…

## 33 Citations

k-Center Clustering with Outliers in Sliding Windows

- Computer ScienceAlgorithms
- 2022

This work provides efficient algorithms for metric k-center clustering in the streaming model under the sliding window setting and shows, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of thewindow points, disregarding a given fraction of noisy distances.

Flattened Exponential Histogram for Sliding Window Queries over Data Streams

- Computer ScienceArXiv
- 2019

The flattened exponential histogram (FEH) model for the Basic Counting problem is presented and it is shown that with the same memory footprint, the accuracy of the model is between 4 to 15 and on average 7 times better than that of the exponential histograms, while the speed is roughly the same.

Diameter and k-Center in Sliding Windows

- Computer Science, MathematicsICALP
- 2016

This paper develops streaming algorithms for the diameter problem and the k-center clustering problem in the sliding window model and proves that any algorithm for the 2-center problem that achieves an approximation ratio of less than 4 requires Omega(N^{1/3}) space.

Submodular Optimization Over Sliding Windows

- Computer ScienceWWW
- 2017

This work provides the first non-trivial algorithm that maintains a provable approximation of the optimum using space sublinear in the size of the window, and matches the best known approximation guarantees for submodular optimization in insertion-only streams.

Improved Sliding Window Algorithms for Clustering and Coverage via Bucketing-Based Sketches

- Computer ScienceSODA
- 2022

This work proposes a new algorithmic framework for designing efficient sliding window algorithms via bucketing-based sketches and develops space-efficient slidingwindow algorithms for k-cover, k-clustering and diversity maximization problems.

Better Sliding Window Algorithms to Maximize Subadditive and Diversity Objectives

- Computer SciencePODS
- 2019

This work describes an alternative approach to designing efficient sliding window algorithms for maximization problems, and instantiates this approach on a wide range of problems, yielding better algorithms for submodular function optimization, diversity optimization and general subadditive optimization.

Efficient Data Stream Clustering With Sliding Windows Based on Locality-Sensitive Hashing

- Computer ScienceIEEE Access
- 2018

This paper improves data stream clustering over sliding windows using sliding window aggregation and nearest neighbor search techniques, and suggests a re-clustering policy that determines whether to append a new summary to pre-existing clusters or to perform clustering on the whole summary.

Sliding Window Algorithms for k-Clustering Problems

- Computer ScienceNeurIPS
- 2020

This work provides simple and practical algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch, and finds solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.

Streaming Balanced Clustering

- Computer ScienceArXiv
- 2019

This work develops Emph, the first single pass streaming algorithm for a general class of clustering problems that includes capacitated $k-median and capacitated £k-means in Euclidean space, using only poly( k d \log \Delta)$ space, where k is the number of clusters, d is the dimension and $\Delta$ is the maximum relative range of a coordinate.

Numerical Linear Algebra in the Sliding Window Model

- Computer Science
- 2018

This work gives a deterministic algorithm that achieves spectral approximation in the sliding window model that can be viewed as a generalization of smooth histograms, using the Loewner ordering of PSD matrices, and gives algorithms for both spectral approximation and low-rank approximation that are space-optimal up to polylogarithmic factors.

## References

SHOWING 1-10 OF 39 REFERENCES

Effective Computations on Sliding Windows

- Computer ScienceSIAM J. Comput.
- 2010

This paper presents a novel smooth histogram method that is more general and achieves stronger bounds than the exponential histogram, and provides the first approximation algorithms for the following functions: $L_p$ norms, frequency moments, the length of the increasing subsequence, and the geometric mean.

Better streaming algorithms for clustering problems

- Computer ScienceSTOC '03
- 2003

A randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n) and gives bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.

Maintaining Stream Statistics over Sliding Windows

- Computer Science, MathematicsSIAM J. Comput.
- 2002

The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$.

Streaming k-means on well-clusterable data

- Computer ScienceSODA '11
- 2011

A near-optimal streaming approximation algorithm for k-means in high-dimensional Euclidean space with sublinear memory and a single pass is shown, under the very natural assumption of data separability.

On coresets for k-means and k-median clustering

- Computer ScienceSTOC '04
- 2004

This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median.

Maintaining variance and k-medians over data stream windows

- Computer SciencePODS '03
- 2003

A novel technique is presented for solving two important and related problems in the sliding window model---maintaining variance and maintaining a <i>k</i>--median clustering and a constant-factor approximation algorithm is presented.

Element Distinctness, Frequency Moments, and Sliding Windows

- Computer Science2013 IEEE 54th Annual Symposium on Foundations of Computer Science
- 2013

A randomized algorithm is developed for the element distinctness problem whose time T and space S satisfy T ∈ Õ (n3/2/S1/2), smaller than previous lower bounds for comparison-based algorithms, showing thatelement distinctness is strictly easier than sorting for randomized branching programs.

Dynamic Graphs in the Sliding-Window Model

- Computer ScienceESA
- 2013

An extensive set of positive results including algorithms for constructing basic graph synopses like combinatorial sparsifiers and spanners as well as approximating classic graph properties such as the size of a graph matching or minimum spanning tree are presented.

Zero-One Laws for Sliding Windows and Universal Sketches

- Computer ScienceAPPROX-RANDOM
- 2015

It is shown that it is possible to collect universal statistics of polylogarithmic size, and it is proved that these universal statistics allow us after the fact to compute all other statistics that are computable with similar amounts of memory.

Clustering Data Streams

- Computer ScienceFOCS
- 2000

This work gives constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass, and shows negative results implying that these algorithms cannot be improved in a certain sense.