• Corpus ID: 195767658

Streaming Quantiles Algorithms with Small Space and Update Time

@article{Ivkin2019StreamingQA,
  title={Streaming Quantiles Algorithms with Small Space and Update Time},
  author={Nikita Ivkin and Edo Liberty and Kevin J. Lang and Zohar S. Karnin and Vladimir Braverman},
  journal={ArXiv},
  year={2019},
  volume={abs/1907.00236}
}
Approximating quantiles and distributions over streaming data has been studied for roughly two decades now. Recently, Karnin, Lang, and Liberty proposed the first asymptotically optimal algorithm for doing so. This manuscript complements their theoretical result by providing a practical variants of their algorithm with improved constants. For a given sketch size, our techniques provably reduce the upper bound on the sketch error by a factor of two. These improvements are verified experimentally… 
Theory meets Practice: worst case behavior of quantile algorithms
TLDR
This work shows how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrates that it fails to provide accurate results even on i.i.d. samples from a highly nonuniform distribution, and proposes practical improvements to ReqSketch, making it faster than t-Digest, while its error stays bounded on any instance.
KLL±: Approximate Quantile Sketches over Dynamic Datasets
TLDR
KLL± is proposed, the first quantile approximation algorithm to operate in the bounded deletionmodel to account for both inserts and deletes in a given data stream to support arbitrary updates with small space overhead.
QPipe: quantiles sketch fully in the data plane
TLDR
This paper introduces QPipe, the first quantiles sketching algorithm that can be implemented entirely in the data plane, and gives novel implementations of argmin(), the major building block of SweepKLL which are usually not supported in theData plane of the commodity switch.
SpaceSaving±: An Optimal Algorithm for Frequency Estimation and Frequent items in the Bounded Deletion Model
TLDR
The space lower bound for solving the deterministic frequent items problem in the bounded deletion model is established, the Lazy SpaceS saving± and SpaceSaving± algorithms with optimal space bound are proposed, and an efficient implementation of the SpaceS Saving± algorithm is developed that minimizes the latency of update operations using novel data structures.
Approximate Quantiles for Datacenter Telemetry Monitoring
TLDR
This work proposes AOMG, an efficient and accurate quantile approximation algorithm that capitalizes insights from the workload study and improves performance through two-level hierarchical windowing while offering small value errors in a wide range of quantiles by taking into account the density of underlying data distribution.
I Know What You Did Last Summer: Network Monitoring using Interval Queries
TLDR
This work presents the first integral solution that enables multiple measurement tasks inside the same data structure, supports specifying the time frame of interest as part of its queries, and is sketch-based and thus space efficient.
Communication-Efficient Weighted Sampling and Quantile Summary for GBDT
TLDR
Two novel communication-efficient methods over distributed dataset to mitigate communication overhead are proposed, a weighted sampling approach by which it can estimate the information gain over a small subset efficiently, and distributed protocols for weighted quantile problem used in approximate tree learning.
I Know What You Did Last Summer
TLDR
This work presents a first integral solution that enables multiple measurement tasks inside the same data structure, supports specifying the time frame of interest as part of its queries, and is sketch-based and thus space efficient.
Theory meets Practice at the Median: A Worst Case Comparison of Relative Error Quantile Algorithms
TLDR
This work shows how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrates that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution, and proposes practical improvements to ReqSketch, making it faster than t-Digest, while its error stays bounded on any instance.

References

SHOWING 1-10 OF 24 REFERENCES
Optimal Quantile Approximation in Streams
TLDR
This paper resolves one of the longest standing basic problems in the streaming computational model and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log log (1/δ)) lower bound is known.
Space-efficient online computation of quantile summaries
TLDR
The actual space bounds obtained on experimental data are significantly better than the worst case guarantees of the algorithm as well as the observed space requirements of earlier algorithms.
Approximate medians and other quantiles in one pass and with limited memory
TLDR
New algorithms for computing approximate quantiles of large datasets in a single pass are presented, and the main memory requirements are smaller than those reported by an order of magnitude.
Random sampling techniques for space efficient online computation of order statistics of large datasets
TLDR
A novel non-uniform random sampling scheme and an extension of this framework are presented which form the basis of a new algorithm which computes approximate quantiles without knowing the input sequence length.
Quantiles over data streams: an experimental study
TLDR
This paper proposes and analyzes variations of efficient methods that have not been explicitly studied before, yet which turn out to perform the best, and provides detailed experimental comparisons demonstrating the tradeoffs between space, time, and accuracy for quantile computation.
Quantiles and Equi-depth Histograms over Streams
TLDR
This chapter presents a broad range of algorithmic ideas for computing quantile summaries of data streams using small space, and highlights connections among these ideas, and how techniques developed for one setting sometimes naturally lend themselves to a seemingly different setting.
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
TLDR
This work presents the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network.
Sampling based algorithms for quantile computation in sensor networks
TLDR
This paper presents a sampling based quantile computation algorithm with O(√kh/ε) total communication (h is the height of the routing tree), which grows sublinearly with the network size except in the pathological case h=Θ(k).
Continuously maintaining quantile summaries of the most recent N elements over a data stream
TLDR
An algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements can be answered with a guaranteed precision of /spl epsiv/n and the space requirement is much less than the given theoretical bound.
Approximate counts and quantiles over sliding windows
TLDR
This work considers the problem of maintaining ε-approximate counts and quantiles over a stream sliding window using limited space and presents various deterministic and randomized algorithms for approximate counts andquantiles that require O(1/ε polylog( 1/ε, N)) space.
...
1
2
3
...