Corpus ID: 236447675

Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

@article{Kang2021ProofAA,
  title={Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates},
  author={Daniel Kang and John Guibas and Peter Bailis and Tatsunori B. Hashimoto and Yi Sun and Matei A. Zaharia},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.12525}
}
Given a dataset D, we are interested in computing the mean of a subset of D which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget N . In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate O(N 1 +N −1 2 +N 1/2 1 N −3/2 2 ), where N = K ·N1 +N2 for some integer constant K and K ·N1 and N2 represent the number of samples used in Stage 1 and Stage 2 of ABae… Expand

Tables from this paper

Accelerating Approximate AggregationQueries with Expensive Predicates
Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deepExpand
Accelerating Approximate Aggregation Queries with Expensive Predicates
TLDR
A query processing algorithm that leverages proxies (ABae) is developed and analyzed that converges at an optimal rate in a novel analysis of stratified sampling with draws that may not satisfy the predicate and outperforms on baselines on six real-world datasets. Expand

References

SHOWING 1-3 OF 3 REFERENCES
Connected Components in Random Graphs with Given Expected Degree Sequences
Abstract. We consider a family of random graphs with a given expected degree sequence. Each edge is chosen independently with probability proportional to the product of the expected degrees of itsExpand
U-statistics Notes for Statistics 200c, Spring 2005
1. Definitions. The basic theory of U-statistics was developed by W. Hoeffding (1948a). Detailed expositions of the general topic may be found in M. Denker (1985) and A. J. Lee (1990). See alsoExpand
Advanced statistical theory lecture
  • September 24
  • 2018