# A Survey of Approximate Quantile Computation on Large-Scale Data

@article{Chen2020ASO, title={A Survey of Approximate Quantile Computation on Large-Scale Data}, author={Zhiwei Chen and Aoqian Zhang}, journal={IEEE Access}, year={2020}, volume={8}, pages={34585-34597} }

As data volume grows extensively, data profiling helps to extract metadata of large-scale data. However, one kind of metadata, order statistics, is difficult to be computed because they are not mergeable or incremental. Thus, the limitation of time and memory space does not support their computation on large-scale data. In this paper, we focus on an order statistic, quantiles, and present a comprehensive analysis of studies on approximate quantile computation. Both deterministic algorithms and…

## Figures and Tables from this paper

## 10 Citations

Efficient and Error-bounded Spatiotemporal Quantile Monitoring in Edge Computing Environments

- Computer ScienceProc. VLDB Endow.
- 2022

This paper designs a processing framework that virtualizes edge-resident data sketches for quantile computing and devise a relaxation algorithm to converge to optimal latencies for those subqueries whose result errors are still bounded.

KLL±: Approximate Quantile Sketches over Dynamic Datasets

- Computer ScienceProc. VLDB Endow.
- 2021

KLL± is proposed, the first quantile approximation algorithm to operate in theounded deletionmodel to account for both inserts and deletes in a given data stream to support arbitrary updates with small space overhead.

Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting

- Computer ScienceArXiv
- 2022

This work derives lower and upper bounds on the number of sampling/histogramming rounds required to compute a balanced partitioning and proposes a hard randomized input distribution and applies classical results from the distribution theory of runs to derive the lower bound.

Histogram Specification by Assignment of Optimal Unique Values

- Computer Science
- 2021

Two novel algorithms for histogram specification and quantile transformation of data without local information are proposed that can be easily incorporated in applications spanning many disciplines, especially in applied data science.

Scaling Equi-Joins

- Computer ScienceSIGMOD Conference
- 2022

This paper proposes Adaptive-Multistage-Join (AM-Join), a novel algorithm that scales well when the joined tables share hot keys, and Broadcast-Join, the fastest-known when joining keys are hot in only one table, for scalable and fast equi-joins in distributed shared-nothing architectures.

Spiking Neural Networks Through the Lens of Streaming Algorithms

- Computer ScienceDISC
- 2020

A generic reduction is given, showing that any space- efficient spiking neural network can be simulated by a space-efficiently streaming algorithm, and establishing a close connection between these two models.

Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare

- Computer Science
- 2020

Different variability metrics applied to irregularly (nonuniformly) sampled time series, which have important clinical applications, particularly in mental healthcare, are compared to identify the most robust and interpretable variability measures out of a set 21 candidates.

AcME - Accelerated Model-agnostic Explanations: Fast Whitening of the Machine-Learning Black Box

- Computer ScienceArXiv
- 2021

Accelerated Model-agnostic Explanations (AcME), an interpretability approach that quickly provides feature importance scores both at the global and the local level, which can be applied a posteriori to each regression or classification model.

hermiter: R package for Sequential Nonparametric Estimation

- Mathematics
- 2021

This article introduces the R package hermiter which facilitates estimation of univariate and bivariate probability density functions and cumulative distribution functions along with full quantile…

Statistical anonymity: Quantifying reidentification risks without reidentifying users

- Computer ScienceArXiv
- 2022

This paper explores ideas — objectives, metrics, protocols, and extensions — for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of k-anonymity, and describes a class of protocols aimed at achieving these goals.

## References

SHOWING 1-10 OF 90 REFERENCES

Space- and time-efficient deterministic algorithms for biased quantiles over data streams

- Computer SciencePODS '06
- 2006

This work presents the first deterministic algorithms for answering biased quantiles queries accurately with small—sublinear in the input size—space and time bounds in one pass, and shows it uses less space than existing methods in many practical settings, and is fast to maintain.

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

- Computer ScienceThe VLDB Journal
- 2016

This paper provides a taxonomy of different methods and proposes new variants that have not been studied before, yet which outperform existing methods and describe efficient implementations of these methods.

Computing Extremely Accurate Quantiles Using t-Digests

- Computer ScienceArXiv
- 2019

This new algorithm is robust with respect to skewed distributions or ordered datasets and allows separately computed summaries to be combined with no loss in accuracy.

A Fast Algorithm for Approximate Quantiles in High Speed Data Streams

- Computer Science19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)
- 2007

A fast algorithm for computing approximate quantiles in high speed data streams with deterministic error bounds for data streams of size N where N is unknown in advance and the stream is partitioned into sub-streams of exponentially increasing size as they arrive.

Effective computation of biased quantiles over data streams

- Computer Science21st International Conference on Data Engineering (ICDE'05)
- 2005

This paper formalizes them as the "high-biased" and the "targeted" quantiles problems, respectively, and presents algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems.

Accurate quantile estimation for skewed data streams

- Computer Science2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC)
- 2017

The comprehensive experimental evaluation results demonstrate that the estimated quantiles of the proposed algorithm are highly accurate than existing methods in the literature on both synthetic and real-world datasets, especially on important extreme quantiles.

Approximate medians and other quantiles in one pass and with limited memory

- Computer ScienceSIGMOD '98
- 1998

New algorithms for computing approximate quantiles of large datasets in a single pass are presented, and the main memory requirements are smaller than those reported by an order of magnitude.

A One-Pass Space-Efficient Algorithm for Finding Quantiles

- Computer ScienceCOMAD
- 1995

An algorithm fording the quantile values of a large unordered dataset with unknown distribution that requires only one pass over the data and the true quantile is guaranteed to lie within the lower and upper bounds produced by the algorithm.

Accurate Quantile Estimation for Skewed Data Streams Using Nonlinear Interpolation

- Computer Science, EngineeringIEEE Access
- 2018

The comprehensive experimental evaluation results demonstrate that the estimated quantiles of the proposed algorithm are more accurate than the existing methods in the literature on both synthetic and real-world datasets, especially on important extreme quantiles.

A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data

- Computer ScienceVLDB
- 1997

The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets, and extra time and memory for computing additional quantiles (beyond the first one) are constant per quantile.