# Streaming Quantiles Algorithms with Small Space and Update Time

@article{Ivkin2019StreamingQA, title={Streaming Quantiles Algorithms with Small Space and Update Time}, author={Nikita Ivkin and Edo Liberty and Kevin J. Lang and Zohar S. Karnin and Vladimir Braverman}, journal={ArXiv}, year={2019}, volume={abs/1907.00236} }

Approximating quantiles and distributions over streaming data has been studied for roughly two decades now. Recently, Karnin, Lang, and Liberty proposed the first asymptotically optimal algorithm for doing so. This manuscript complements their theoretical result by providing a practical variants of their algorithm with improved constants. For a given sketch size, our techniques provably reduce the upper bound on the sketch error by a factor of two. These improvements are verified experimentally…

## Figures, Tables, and Topics from this paper

## 9 Citations

Theory meets Practice: worst case behavior of quantile algorithms

- Computer ScienceArXiv
- 2021

This work shows how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrates that it fails to provide accurate results even on i.i.d. samples from a highly nonuniform distribution, and proposes practical improvements to ReqSketch, making it faster than t-Digest, while its error stays bounded on any instance.

KLL±: Approximate Quantile Sketches over Dynamic Datasets

- Computer ScienceProc. VLDB Endow.
- 2021

KLL± is proposed, the first quantile approximation algorithm to operate in the bounded deletionmodel to account for both inserts and deletes in a given data stream to support arbitrary updates with small space overhead.

QPipe: quantiles sketch fully in the data plane

- Computer ScienceCoNEXT
- 2019

This paper introduces QPipe, the first quantiles sketching algorithm that can be implemented entirely in the data plane, and gives novel implementations of argmin(), the major building block of SweepKLL which are usually not supported in theData plane of the commodity switch.

SpaceSaving±: An Optimal Algorithm for Frequency Estimation and Frequent items in the Bounded Deletion Model

- Computer ScienceArXiv
- 2021

The space lower bound for solving the deterministic frequent items problem in the bounded deletion model is established, the Lazy SpaceS saving± and SpaceSaving± algorithms with optimal space bound are proposed, and an efficient implementation of the SpaceS Saving± algorithm is developed that minimizes the latency of update operations using novel data structures.

Approximate Quantiles for Datacenter Telemetry Monitoring

- Computer Science2020 IEEE 36th International Conference on Data Engineering (ICDE)
- 2020

This work proposes AOMG, an efficient and accurate quantile approximation algorithm that capitalizes insights from the workload study and improves performance through two-level hierarchical windowing while offering small value errors in a wide range of quantiles by taking into account the density of underlying data distribution.

I Know What You Did Last Summer: Network Monitoring using Interval Queries

- Computer ScienceAbstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems
- 2020

This work presents the first integral solution that enables multiple measurement tasks inside the same data structure, supports specifying the time frame of interest as part of its queries, and is sketch-based and thus space efficient.

Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

- Computer ScienceArXiv
- 2019

Two novel communication-efficient methods over distributed dataset to mitigate communication overhead are proposed, a weighted sampling approach by which it can estimate the information gain over a small subset efficiently, and distributed protocols for weighted quantile problem used in approximate tree learning.

I Know What You Did Last Summer

- Computer ScienceProc. ACM Meas. Anal. Comput. Syst.
- 2019

This work presents a first integral solution that enables multiple measurement tasks inside the same data structure, supports specifying the time frame of interest as part of its queries, and is sketch-based and thus space efficient.

Theory meets Practice at the Median: A Worst Case Comparison of Relative Error Quantile Algorithms

- Computer Science, MathematicsKDD
- 2021

This work shows how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrates that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution, and proposes practical improvements to ReqSketch, making it faster than t-Digest, while its error stays bounded on any instance.

## References

SHOWING 1-10 OF 24 REFERENCES

Optimal Quantile Approximation in Streams

- Mathematics, Computer Science2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

This paper resolves one of the longest standing basic problems in the streaming computational model and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log log (1/δ)) lower bound is known.

Space-efficient online computation of quantile summaries

- Computer ScienceSIGMOD '01
- 2001

The actual space bounds obtained on experimental data are significantly better than the worst case guarantees of the algorithm as well as the observed space requirements of earlier algorithms.

Approximate medians and other quantiles in one pass and with limited memory

- Computer ScienceSIGMOD '98
- 1998

New algorithms for computing approximate quantiles of large datasets in a single pass are presented, and the main memory requirements are smaller than those reported by an order of magnitude.

Random sampling techniques for space efficient online computation of order statistics of large datasets

- Computer ScienceSIGMOD '99
- 1999

A novel non-uniform random sampling scheme and an extension of this framework are presented which form the basis of a new algorithm which computes approximate quantiles without knowing the input sequence length.

Quantiles over data streams: an experimental study

- Computer ScienceSIGMOD '13
- 2013

This paper proposes and analyzes variations of efficient methods that have not been explicitly studied before, yet which turn out to perform the best, and provides detailed experimental comparisons demonstrating the tradeoffs between space, time, and accuracy for quantile computation.

Quantiles and Equi-depth Histograms over Streams

- Mathematics, Computer ScienceData Stream Management
- 2016

This chapter presents a broad range of algorithmic ideas for computing quantile summaries of data streams using small space, and highlights connections among these ideas, and how techniques developed for one setting sometimes naturally lend themselves to a seemingly different setting.

Holistic aggregates in a networked world: distributed tracking of approximate quantiles

- Computer ScienceSIGMOD '05
- 2005

This work presents the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network.

Sampling based algorithms for quantile computation in sensor networks

- Computer ScienceSIGMOD '11
- 2011

This paper presents a sampling based quantile computation algorithm with O(√kh/ε) total communication (h is the height of the routing tree), which grows sublinearly with the network size except in the pathological case h=Θ(k).

Continuously maintaining quantile summaries of the most recent N elements over a data stream

- Computer ScienceProceedings. 20th International Conference on Data Engineering
- 2004

An algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements can be answered with a guaranteed precision of /spl epsiv/n and the space requirement is much less than the given theoretical bound.

Approximate counts and quantiles over sliding windows

- Mathematics, Computer SciencePODS '04
- 2004

This work considers the problem of maintaining ε-approximate counts and quantiles over a stream sliding window using limited space and presents various deterministic and randomized algorithms for approximate counts andquantiles that require O(1/ε polylog( 1/ε, N)) space.