# Relative Error Streaming Quantiles

@article{Cormode2021RelativeES, title={Relative Error Streaming Quantiles}, author={Graham Cormode and Zohar S. Karnin and Edo Liberty and Justin Thaler and Pavel Vesel'y}, journal={Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems}, year={2021} }

Approximating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of n items from a data universe U equipped with a total order, the task is to compute a sketch (data structure) of size poly (log(n), 1/ε). Given the sketch and a query item y ∈ U, one should be able to approximate its rank in the stream, i.e., the number of stream elements smaller than or equal to y. Most works to date focused on additive ε n error…

## 11 Citations

Bounded Space Differentially Private Quantiles

- Computer Science
- 2021

This work devise a differentially private algorithm for the quantile estimation problem, with strongly sublinear space complexity, in the one-shot and continual observation settings, and presents another algorithm based on histograms that is especially suited to the multiple quantiles case.

Theory meets Practice: worst case behavior of quantile algorithms

- Computer ScienceArXiv
- 2021

This work shows how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrates that it fails to provide accurate results even on i.i.d. samples from a highly nonuniform distribution, and proposes practical improvements to ReqSketch, making it faster than t-Digest, while its error stays bounded on any instance.

Theory meets Practice at the Median: A Worst Case Comparison of Relative Error Quantile Algorithms

- Computer ScienceKDD
- 2021

This work shows how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrates that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution, and proposes practical improvements to ReqSketch, making it faster than t-Digest, while its error stays bounded on any instance.

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

- Computer ScienceArXiv
- 2022

This work designs an algorithm that augments a quantile sketch within each entry of a heavy hitter algorithm, resulting in similar space complexity but with a deterministic error guarantee, and presents SQUAD, a method that combines sampling and sketching while improving the asymptotic space complexity.

Asymmetric scale functions for t-digests

- MathematicsJournal of Statistical Computation and Simulation
- 2021

A t-digest variant with accuracy asymmetric about the median is developed, thereby making possible alternative trade-offs between computational resources and accuracy which may be of particular interest for distributions with significant skew.

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

- Computer ScienceArXiv
- 2021

Amazon SageMaker Model Monitor is presented, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker and automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models.

Current Trends in Data Summaries

- Computer ScienceSIGMOD Rec.
- 2021

In this column, recent developments in data summarization are surveyed, with the intent of inspiring further advances.

Relative Error Streaming Quantiles

- Computer ScienceSIGMOD Rec.
- 2022

This paper presents a new approach to estimating ranks, quantiles, and distributions over streaming data by computing a sketch of size polylogarithmic in n from a data universe equipped with a total order.

A Human-Centric Take on Model Monitoring

- Computer ScienceArXiv
- 2022

The need and the challenge for the model monitoring systems to clarify the impact of the monitoring observations on outcomes are found and such insights must be actionable, robust, customizable for domain-speciﬁc use cases, and cognitively considerate to avoid information overload.

Technical Perspective

- Computer ScienceSIGMOD Rec.
- 2022

Solutions to this problem have numerous applications in large-scale data analysis and can potentially be used for range query selectivity estimation in database engines.

## References

SHOWING 1-10 OF 36 REFERENCES

Optimal Quantile Approximation in Streams

- Computer Science2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

This paper resolves one of the longest standing basic problems in the streaming computational model and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log log (1/δ)) lower bound is known.

A Tight Lower Bound for Comparison-Based Quantile Summaries

- Computer SciencePODS
- 2020

This paper focuses on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe, and improves the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1+-ε)⋅ φ, and for other related computational tasks.

DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees

- Computer ScienceProc. VLDB Endow.
- 2019

This work presents the first fully-mergeable, relative-error quantile sketching algorithm with formal guarantees, which is extremely fast and accurate, and is currently being used by Datadog at a wide-scale.

Space- and time-efficient deterministic algorithms for biased quantiles over data streams

- Computer SciencePODS '06
- 2006

This work presents the first deterministic algorithms for answering biased quantiles queries accurately with small—sublinear in the input size—space and time bounds in one pass, and shows it uses less space than existing methods in many practical settings, and is fast to maintain.

Space-efficient online computation of quantile summaries

- Computer ScienceSIGMOD '01
- 2001

The actual space bounds obtained on experimental data are significantly better than the worst case guarantees of the algorithm as well as the observed space requirements of earlier algorithms.

Effective computation of biased quantiles over data streams

- Computer Science21st International Conference on Data Engineering (ICDE'05)
- 2005

This paper formalizes them as the "high-biased" and the "targeted" quantiles problems, respectively, and presents algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems.

A Randomized Online Quantile Summary in O((1/ε) log(1/ε)) Words

- Computer ScienceTheory Comput.
- 2017

This paper develops a randomized online quantile summary for the cash register data input model and comparison data domain model that uses O( 1 " log 1 " ) words of memory that improves upon the previous best upper bound.

An efficient algorithm for approximate biased quantile computation in data streams

- Computer ScienceCIKM '07
- 2007

This work proposes an efficient algorithm that dynamically maintains the biased quantile summary for the entire stream as the exponential histogram over the block-wise quantile summaries in large data streams.

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

- Computer ScienceThe VLDB Journal
- 2016

This paper provides a taxonomy of different methods and proposes new variants that have not been studied before, yet which outperform existing methods and describe efficient implementations of these methods.

Random sampling techniques for space efficient online computation of order statistics of large datasets

- Computer ScienceSIGMOD '99
- 1999

A novel non-uniform random sampling scheme and an extension of this framework are presented which form the basis of a new algorithm which computes approximate quantiles without knowing the input sequence length.