Revisiting Reuse for Approximate Query Processing

@article{Galakatos2017RevisitingRF,
  title={Revisiting Reuse for Approximate Query Processing},
  author={Alex Galakatos and Andrew Crotty and Emanuel Zgraggen and Carsten Binnig and Tim Kraska},
  journal={Proc. VLDB Endow.},
  year={2017},
  volume={10},
  pages={1142-1153}
}
Visual data exploration tools allow users to quickly gather insights from new datasets. As dataset sizes continue to increase, though, new techniques will be necessary to maintain the interactivity guarantees that these tools require. Approximate query processing (AQP) attempts to tackle this problem and allows systems to return query results at "human speed." However, existing AQP techniques start to break down when confronted with ad hoc queries that target the tails of the distribution… 

Figures from this paper

Model-based Approximate Query Processing

TLDR
A new approach to AQP is presented called Model-based AQP that leverages generative models learned over the complete database to answer SQL queries at interactive speeds and can in many scenarios return more accurate results in a shorter runtime.

Model-based ApproximateQuery Processing

TLDR
A new approach to AQP is presented called Modelbased AQP that leverages generative models learned over the complete database to answer SQL queries at interactive speeds and can in many scenarios return more accurate results in a shorter runtime.

EntropyDB: a probabilistic approach to approximate query processing

TLDR
An interactive data exploration system that uses a probabilistic approach to generate a small, query-able summary of a dataset can successfully answer queries faster than sampling while introducing, on average, no more error than sampling and can better distinguish between rare and nonexistent values.

Approximate Query Processing for Data Exploration using Deep Generative Models

TLDR
This work uses deep generative models, an unsupervised learning based approach, to learn the data distribution faithfully such that aggregate queries could be answered approximately by generating samples from the learned model.

AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics

TLDR
This paper proposes AQP++, a novel framework that achieves a more flexible and better trade-off among preprocessing cost, query response time, and answer quality than AQP or AggPre.

Approximate Query Processing using Deep Generative Models

TLDR
This work uses deep generative models, an unsupervised learning based approach, to learn the data distribution faithfully such that aggregate queries could be answered approximately by generating samples from the learned model.

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

TLDR
This work proposes an algorithm for optimally partitioning the data into such a data structure with various practical approximation techniques and proposes an AQP physical design called PASS, or Precomputation-Assisted Stratified Sampling.

Exploration of Knowledge Graphs via Online Aggregation

TLDR
An algorithm for online aggregation that specializes in exploration queries on knowledge graphs is devised, which leverages the low dimension of RDF graphs, and the low selectivity of exploration queries, by augmenting random walks with exact partial computations using a worst-case optimal join algorithm.

Approximating Aggregated SQL Queries with LSTM Networks

TLDR
A method for query approximation, also known as approximate query processing (AQP), that reduce the need to scan data during inference (query calculation), thus enabling a rapid query processing tool, and produces a lightweight LSTM network which provides a high query throughput.

Approximate Query Processing: What is New and Where to Go?

TLDR
The survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications and provide research challenges and opportunities of AQP.
...

References

SHOWING 1-10 OF 36 REFERENCES

Database Learning: Toward a Database that Becomes Smarter Every Time

TLDR
The principle of maximum entropy is exploited to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations and which lead to increasingly faster response times for future queries.

G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data

TLDR
G-OLA, a novel mini-batch execution model that generalizes OLA to support general OLAP queries with arbitrarily nested aggregates using efficient delta maintenance techniques is implemented in FluoDB, a parallel online query execution framework that is built on top of the Spark cluster computing framework that can scale to massive data sets.

Distributed and interactive cube exploration

TLDR
DICE is introduced, a distributed system that uses a novel session-oriented model for data cube exploration, designed to provide the user with interactive sub-second latencies for specified accuracy levels.

Revisiting Reuse in Main Memory Database Systems

TLDR
A novel reuse model for intermediates is studied, which caches internal physical data structures materialized during query processing (due to pipeline breakers) and externalizes them so that they become reusable for upcoming operations.

Recycling in pipelined query evaluation

TLDR
The novelty of this paper is to show how recycling can successfully be applied in pipelined query executors, by tracking the benefit of materializing possible intermediate results and then choosing the ones making best use of a limited intermediate result cache.

Ripple joins for online aggregation

TLDR
It is shown how ripple joins can be implemented in an existing DBMS using iterators, and an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters are given.

Wander Join: Online Aggregation via Random Walks

TLDR
This paper proposes a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph, and designs an optimizer that chooses the optimal plan for conducting the random walks without having to collect any statistics a priori.

Scalable approximate query processing with the DBO engine

This article describes query processing in the DBO database system. Like other database systems designed for ad hoc analytic processing, DBO is able to compute the exact answers to queries over a

BlinkDB: queries with bounded errors and bounded response times on very large data

TLDR
BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.

Online aggregation

TLDR
A new online aggregation interface is proposed that permits users to both observe the progress of their aggregation queries and control execution on the fly, and a suite of techniques that extend a database system to meet these requirements are presented.