Explaining Inference Queries with Bayesian Optimization

  title={Explaining Inference Queries with Bayesian Optimization},
  author={Brandon Lockhart and Jinglin Peng and Weiyuan Wu and Jiannan Wang and Eugene Wu},
  journal={Proc. VLDB Endow.},
Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data. Inference query explanation seeks to explain unexpected aggregate query results on inference data; such queries are challenging to explain because an explanation may need to be derived from the source, training, or inference data in an ML pipeline. In this paper, we model an objective function as a black-box function and propose BOExplain, a novel… 

Figures and Tables from this paper

From Cleaning before ML to Cleaning for ML
Data cleaning is widely regarded as a critical piece of machine learning (ML) applications, as data errors can corrupt models in ways that cause the application to operate incorrectly, unfairly, or…
Explaining InferenceQueries with Bayesian Optimization
Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data. Inference query explanation seeks to explain…


Explaining Query Answers with Explanation-Ready Databases
This paper proposes a generic framework that can support much richer, insightful explanations by preparing the database offline, so that top explanations can be found interactively at query time.
A formal approach to finding explanations for database queries
A principled approach to provide explanations for answers to SQL queries based on intervention: removal of tuples from the database that significantly affect the query answers is introduced.
Explaining Aggregates for Exploratory Analytics
In XAXA, explanations for future AQs can be computed without any database (DB) access and can be used to further explore the queried data subspaces, without issuing any more queries to the DB.
Complaint-driven Training Data Debugging for Query 2.0
This work proposes Rain, a complaint-driven training data debugging system that allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved.
Scorpion: Explaining Away Outliers in Aggregate Queries
This work proposes Scorpion, a system that takes a set of user-specified outlier points in an aggregate query result as input and finds predicates that explain the outliers in terms of properties of the input tuples that are used to compute the selected outlier results.
Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances
This work presents a novel approach for explaining outliers in aggregation queries through counter- balancing, and presents efficient methods for mining such aggregate regression pat- terns (ARPs), and discusses how to use ARPs to generate and rank explanations.
DIFF: A Relational Interface for Large-Scale Data Explanation
This work proposes the DIFF operator, a relational aggregation operator that unifies the core functionality of these engines with declarative relational query processing, and demonstrates how it can provide the same semantics as existing explanation engines while capturing a broad set of production use cases in industry.
PerfXplain: Debugging MapReduce Job Performance
PerfXplain provides a new query language for articulating performance queries and an algorithm for generating explanations from a log of past MapReduce job executions, based on techniques related to decision-tree building.
Causality and Explanations in Databases
This tutorial surveys research on causality and explanation in the database and AI communities, giving researchers a snapshot of the current state of the art, and proposes a unified framework as well as directions for future research.
A Tutorial on Bayesian Optimization
This tutorial describes how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient, and provides a generalization of expected improvement to noisy evaluations beyond the noise-free setting where it is more commonly applied.