Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications

  title={Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications},
  author={Luke J. Keele and Dylan S. Small},
  journal={The American Statistician},
  pages={355 - 363}
Abstract When investigators seek to estimate causal effects, they often assume that selection into treatment is based only on observed covariates. Under this identification strategy, analysts must adjust for observed confounders. While basic regression models have long been the dominant method of statistical adjustment, methods based on matching or weighting have become more common. Of late, methods based on machine learning (ML) have been developed for statistical adjustment. These ML methods… 

Confounder selection strategies targeting stable treatment effect estimators

The ability of the proposed confounder selection strategy to correctly select confounders, and to ensure valid inference of the treatment effect following data-driven covariate selection, is assessed empirically and compared with existing methods using simulation studies.

Randomization Tests to Assess Covariate Balance When Designing and Analyzing Matched Datasets

Through simulation and a real application in political science, this work finds that matched datasets with high levels of covariate balance tend to approximate balance-constrained designs like rerandomization, and analyzing them as such can lead to precise causal analyses.

All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples

A novel approach evaluating performance across thousands of data-generating mechanisms drawn from non-parametric models with semi-informative priors is proposed, and it is found that the nonparametric estimator nearly always outperform the parametric estimators with the exception of having similar performance in terms of bias and slightly worse performance under the smallest sample sizes.

High Resolution Treatment Effects Estimation: Uncovering Effect Heterogeneities with the Modified Causal Forest

There is great demand for inferring causal effect heterogeneity and for open-source statistical software, which is readily available for practitioners. The mcf package is an open-source Python

Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences

This paper provides a comprehensive, systematic meta-mapping of research questions in the social and health sciences to appropriate ML approaches by incorporating the necessary requirements to statistical analysis in these disciplines.

Comparing the Performance of Statistical Adjustment Methods by Recovering the Experimental Benchmark from the REFLUX Trial

  • L. KeeleS. O'NeillR. Grieve
  • Economics
    Medical decision making : an international journal of the Society for Medical Decision Making
  • 2021
It is found that simple propensity score matching methods provide the least accurate estimates versus the RCT benchmark, and future studies should use multiple methods of estimation to fully represent uncertainty according to the choice of estimation approach.

Innovations in Randomization Inference for the Design and Analysis of Experiments and Observational Studies

This dissertation proposes how to implement rerandomization in factorial experiments, extends the theoretical properties of re randomization from single-factor experiments to 2 factorial designs, and demonstrates how a designed experiment can improve precision of estimated factorial effects.

A Survey of Causal Inference Frameworks

This survey aims to provide a review of the past work on causal inference, focusing mainly on potential outcomes framework and causal graphical models, to help accelerate the understanding of causal inference in different domains.

Comment: Will Competition-Winning Methods for Causal Inference Also Succeed in Practice?

First, we would like to congratulate the authors for successfully hosting the causal inference data competition (referred to as Competition henceforth) and contributing a unique and

Spatial and Spatiotemporal Matching Framework for Causal Inference (Short Paper)

Matching is a procedure aimed at reducing the impact of observational data bias in causal analysis. Designing matching methods for spatial data reflecting static spatial or dynamic spatio-temporal



Optimizing matching and analysis combinations for estimating causal effects

Simulation results indicate that combining full matching with double robust analysis performed best in both the simulations and the applied example, particularly when combined with machine learning estimation methods.

Bayesian Nonparametric Modeling for Causal Inference

Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting

Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets

The application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner that creates a single partition of the data into training and validation sets are described.

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

  • Stefan WagerS. Athey
  • Mathematics, Computer Science
    Journal of the American Statistical Association
  • 2018
This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.

Kernel Balancing: A Flexible Non-Parametric Weighting Procedure for Estimating Causal Effects

Methods such as matching and weighting for causal effect estimation attempt to adjust the joint distribution of observed covariates for treated and control units to be the same. However, they often

Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference

A unified approach is proposed that makes it possible for researchers to preprocess data with matching and then to apply the best parametric techniques they would have used anyway and this procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

Semiparametric causal inference in matched cohort studies

Odds ratios can be estimated in case-control studies using standard logistic regression, ignoring the outcome-dependent sampling. In this paper we discuss an analogous result for treatment effects on

Double/Debiased Machine Learning for Treatment and Causal Parameters

This work can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions, and build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed.

Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition

The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both the data testing grounds and the researchers submitting methods whose efficacy would be evaluated.

Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects

The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively "shrink to homogeneity".