Are We Ready For Learned Cardinality Estimation?

  title={Are We Ready For Learned Cardinality Estimation?},
  author={Xiaoying Wang and Changbo Qu and Weiyuan Wu and Jiannan Wang and Qingqing Zhou},
  journal={Proc. VLDB Endow.},
Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this paper, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five… 

Flow-Loss: Learning Cardinality Estimates That Matter

A new loss function, Flow-Loss, is introduced for learning cardinality estimation models that approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans.

Learned Cardinality Estimation: An In-depth Study

A taxonomy and a unified workflow of learned estimators for a better understanding of estimators are provided and a deeper understanding of the behavior of existing methods can provide a more comprehensive and substantial framework for developing better estimators.

Selectivity Functions of Range Queries are Learnable

This paper shows that the selectivity function of a range space with bounded VC-dimension is learnable, and demonstrates that, empirically, even a basic learning algorithm with generic models is able to produce accurate predictions across settings.

Experience-Enhanced Learning: One Size Still does not Fit All in Automatic Database

This paper proposes three methodologies for improving learned methods, i.e. label collection for efficiently pre-training, knowledge base for model transfer and theoretical guarantee for stable performance, and designs a novel experience-enhanced reinforcement learning (EERL), which could efficiently converge and has better performance than general RL models.

Lightweight and Accurate Cardinality Estimation by Neural Network Gaussian Process

A lightweight and accurate cardinality estimation for SQL queries, which is also uncertainty-aware, and Bayesian deep learning (BDL), which serves as a bridge between Bayesian inference and deep learning.

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

This work proposes to formulate the NDV estimation task in a supervised learning framework, and aims to learn a model as the estimator, to offer efficient and accurate NDV estimations for unseen tables and workloads.

FACE: A Normalizing Flow based Cardinality Estimator

A novel cardinality estimator FACE is proposed, which leverages the Normalizing Flow based model to learn a continuous joint distribution for relational data and proposes encoding and indexing techniques to handle Like predicates for string data.

Join Order Optimization with ( Almost ) No Statistics

  • Computer Science
  • 2022
By assuming all joins are foreign-key primary-key joins, and by leveraging their inherent properties, this simple cardinality estimator significantly improves the quality of join orders selected by the join order optimizer.

Machine Learning for Data Management: A System View

  • Guoliang LiXuanhe Zhou
  • Computer Science
    2022 IEEE 38th International Conference on Data Engineering (ICDE)
  • 2022
This tutorial discusses existing learning-based data management studies and how they solve the above challenges, and provides some future research directions.

Database Optimizers in the Era of Learning

This tutorial reviews advances made recently in a decades-old problem, namely query optimization, and presents the early efforts in this area, describing advancements, limitations and open issues, and discusses future research directions.



Cardinality estimation with local deep learning models

This paper introduces a novel local-oriented approach for cardinality estimation, therefore the local context is a specific sub-part of the schema, which leads to better representation of data correlation and thus better estimation accuracy.

DeepDB: Learn from Data, not from Queries!

The results of the empirical evaluation demonstrate that the data-driven approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.

Cardinality Estimation: An Experimental Survey

The aim of this paper is to present a detailed experimental study of twelve algorithms of cardinality estimation, scaling far beyond the original experiments, and to evaluate the algorithms' accuracy, runtime, and memory consumption using synthetic and real-world datasets.

Learning to Optimize Join Queries With Deep Reinforcement Learning

This work proposes a RL-based DQ optimizer, which currently optimizes select-project-join blocks and implements three versions of DQ to illustrate the ease of integration into existing DBMSes.

An Empirical Analysis of Deep Learning for Cardinality Estimation

It is found that simple deep learning models can learn cardinality estimations across a variety of datasets and lead to better query plans across all datasets, reducing the runtimes by up to 49% on select-project-join workloads.

NeuroCard: One Cardinality Estimator for All Tables

This work shows that it is possible to learn the correlations across all tables in a database without any independence assumptions, and presents NeuroCard, a join cardinality estimator that builds a single neural density estimator over an entire database.

Efficiently approximating selectivity functions using low overhead regression models

This work proposes a novel model construction method that incrementally generates training data and uses approximate selectivity labels, that reduces total construction cost by an order of magnitude while preserving most of the accuracy gains.

Learning to accurately COUNT with query-driven predictive analytics

A novel solution to executing aggregation (and specifically COUNT) queries over large-scale data, which is the only query-driven solution applicable over general environments which include restricted-access data, offers incremental learning adjusted for arriving ad-hoc queries, and is well suited for big data analytics.

QuickSel: Quick Selectivity Learning with Mixture Models

This paper proposes a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms, and builds an internal model of the underlying data, which can be refined significantly faster (e.g., only 1.9 milliseconds for 300 queries).

Cardinality Estimation Done Right: Index-Based Join Sampling

Indexbased join sampling is proposed, a novel cardinality estimation technique for main-memory databases that relies on sampling and existing index structures to obtain accurate estimates and significantly improves estimation as well as overall plan quality.