Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

  title={Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs},
  author={Jeremy K.-P. Chen and Yuqing Huang and Mushi Wang and Semih Salihoglu and Ken Salem},
  journal={Proc. VLDB Endow.},
This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate… 

SafeBound: A Practical System for Generating Cardinality Bounds

SafeBound is introduced, the first practical system for generating cardinality bounds, which achieves up to 80% lower end-to-end runtimes than PostgreSQL, and is on par or better than state of the art ML-based estimators and pessimistic cardinality estimators, by improving the runtime of the expensive queries.

Flow-Loss: Learning Cardinality Estimates That Matter

A new loss function, Flow-Loss, is introduced for learning cardinality estimation models that approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans.



Cost-Guided Cardinality Estimation: Focus Where it Matters

Surprisingly, models trained with cost-guided cardinality estimation achieve this increase in query performance while having higher prediction error than models trained without this approach, suggesting that prediction error for cardinalities is not necessarily the correct metric to optimize.

G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching

A comprehensive study of the existing cardinality estimation techniques for subgraph matching queries, scaling far beyond the original experiments, reveals that all existing techniques have serious problems in accuracy for various scenarios and datasets.

Selectivity estimation using probabilistic models

The approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time for both single-table multi-attribute queries and a general class of select-join queries.

Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation

Alley is a hybrid method that combines both sampling and synopses that outperforms the state-of-the-art methods by up to orders of magnitude higher accuracy with similar efficiency and has worst-case optimal runtime and approximation quality guarantees for any given error bound Ξ΅ and required confidence ΞΌ.

Join Size Estimation Subject to Filter Conditions

The proposed algorithm, Correlated Sampling, constructs a small space synopsis for each table, which can be used to provide a quick estimate of the join size of this table with other tables subject to dynamically specified predicate filter conditions.

Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities

In this work we introduce a novel approach to the problem of cardinality estimation over multijoin queries. Our approach leveraging randomized hashing and data sketching to tighten these bounds…

Selectivity and Cost Estimation for Joins Based on Random Sampling

A partial ordering that compares the variability of the estimators for the different procedures after an arbitrary fixed number of sampling steps and implies a partial ordering of the corresponding fixed-precision procedures with respect to sampling cost.

Consistent selectivity estimation via maximum entropy

Experiments show that use of the ME approach can improve the optimizer’s cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times.

How Good Are Query Optimizers, Really?

This paper introduces the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries.

Bloom Histogram: Path Selectivity Estimation for XML Data with Updates