ZaliQL: Causal Inference from Observational Data at Scale

  title={ZaliQL: Causal Inference from Observational Data at Scale},
  author={Babak Salimi and Corey Cole and Dan R. K. Ports and Dan Suciu},
  journal={Proc. VLDB Endow.},
Causal inference from observational data is a subject of active research and development in statistics and computer science. Many statistical software packages have been developed for this purpose. However, these toolkits do not scale to large datasets. We propose and demonstrate ZaliQL: a SQL-based framework for drawing causal inference from observational data. ZaliQL supports the state-of-the-art methods for causal inference and runs at scale within PostgreSQL database system. In addition, we… 
A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence
The proposed framework combines concepts from databases, statistics, and graphical models, and aims to initiate new research directions spanning these fields to facilitate powerful data-driven decisions in today's big data world.
HypDB: A Demonstration of Detecting, Explaining and Resolving Bias in OLAP queries
This work presents HypDB, the first system to detect, explain and resolve bias in OLAP queries, and demonstrates step-by-step how it eliminates the bias via query rewriting and generates decision-support insights.
Bias in OLAP Queries: Detection, Explanation, and Removal
A novel technique is proposed that gives explanations for bias, thus assisting an analyst in understanding what goes on, and an automated method for rewriting a biased query into an unbiased query, which shows what the analyst intended to examine.
FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference
This work proposes a method that computes high quality almost-exact matches for high-dimensional categorical datasets, and leverages techniques that are natural for query processing in the area of database management to perform matching efficiently for large datasets.
Collapsing-Fast-Large-Almost-Matching-Exactly: A Matching Method for Causal Inference
Notable advantages of the method over existing matching procedures are its high-quality matches, versatility in handling different data distributions that may have irrelevant variables, and ability to handle missing data by matching on as many available covariates as possible.
) ( a ) Carriers Delay by Airport : ( Simpson ’ s Paradox ) HypDB : Biased Query Query Answers : ( d ) Explanations for Bias : ( c )
On line analytical processing (OLAP) is an essential element of decisionsupport systems. OLAP tools provide insights and understanding needed for improved decision making. However, the answers to
On the relevance of data science for flight delay research: a systematic review
This work proposes a taxonomy of data science techniques used for investigating flight delay studies, and offers a systematic literature review that describes the trends of the field and methods to analyse the applicability of newly proposed methods.


Causal Inference without Balance Checking: Coarsened Exact Matching
We discuss a method for improving causal inferences called “Coarsened Exact Matching” (CEM), and the new “Monotonic Imbalance Bounding” (MIB) class of matching methods from which CEM is derived. We
Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference
A unified approach is proposed that makes it possible for researchers to preprocess data with matching and then to apply the best parametric techniques they would have used anyway and this procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
PostGIS in Action
This second edition of PostGIS in Action, Second Edition teaches readers of all levels to write spatial queries that solve real-world geodata problems and learns how to optimize queries for maximum speed, simplify geometries for greater efficiency, and create custom functions for your own applications.
Causality: Models, Reasoning and Inference
1. Introduction to probabilities, graphs, and causal models 2. A theory of inferred causation 3. Causal diagrams and the identification of causal effects 4. Actions, plans, and direct effects 5.
cem: Software for Coarsened Exact Matching
The program implements the coarsened exact matching (CEM) algorithm, described below, which may be used alone or in combination with any existing matching method.
Computing Iceberg Queries Efficiently
This work proposes efficient algorithms to evaluate iceberg queries using very little memory and significantly fewer passes over data, as compared to current techniques that use sorting or hashing.
Causal Inference Using Potential Outcomes
Causal effects are defined as comparisons of potential outcomes under different treatments on a common set of units. Observed values of the potential outcomes are revealed by the assignment
Iceberg-cube algorithms: An empirical evaluation on synthetic and real data
Bottom-up and top-down methods are implemented for the Iceberg-Cube problem to identify the combinations of values for a set of attributes for which a specified aggregation function yields values over a specified aggregate threshold.
Observational studies.
Clinicians and researchers should be familiar with observational studies so they may better evaluate a proposed causal relationship and the quality of reports claiming such relationships, and determine if the findings are valid and applicable to their patient population.
Comment: Understanding Simpson’s Paradox
I thank the editor, Ronald Christensen, for the opportunity to discuss this important topic and to comment on the article by Armistead. Simpson’s paradox is often presented as a compelling