Balsa: Learning a Query Optimizer Without Expert Demonstrations

  title={Balsa: Learning a Query Optimizer Without Expert Demonstrations},
  author={Zongheng Yang and Wei-Lin Chiang and Sifei Luan and Gautam Mittal and Michael Luo and Ion Stoica},
Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Balsa first learns basic knowledge from a simple, environment-agnostic simulator, followed by safe… 


Neo: A Learned Query Optimizer
Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them.
Bao: Making Learned Query Optimization Practical
Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints, and combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm.
Learning to Optimize Join Queries With Deep Reinforcement Learning
This work proposes a RL-based DQ optimizer, which currently optimizes select-project-join blocks and implements three versions of DQ to illustrate the ease of integration into existing DBMSes.
SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
This work presents SkinnerDB, a novel database management system that is designed from the ground up for reliable optimization and robust performance, and it is claimed that its execution strategies are the first to provide comparable formal guarantees.
Towards a Learning Optimizer for Shared Clouds
A machine learning based approach to learn cardinality models from previous job executions and use them to predict the cardinalities in future jobs, and describes the feedback loop to apply the learned models back to future job executions.
DeepDB: Learn from Data, not from Queries!
The results of the empirical evaluation demonstrate that the data-driven approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.
Selectivity Estimation for Range Predicates using Lightweight Models
This work proposes two simple yet effective design choices, i.e., regression label transformation and feature engineering, motivated by the selectivity estimation context, and shows that the proposed models deliver both highly accurate estimates as well as fast estimation.
How Good Are Query Optimizers, Really?
This paper introduces the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries.
Learning a Partitioning Advisor for Cloud Databases
A new learned partitioning advisor based on Deep Reinforcement Learning (DRL) for OLAP-style workloads is introduced that is able to find non-trivial partitionings for a wide range of workloads and outperforms more classical approaches for automated partitioning design.
Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning
Astrid is proposed, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches and modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries.