Corpus ID: 237289700

Double Machine Learning and Bad Controls -- A Cautionary Tale

  title={Double Machine Learning and Bad Controls -- A Cautionary Tale},
  author={Paul Hunermund and Beyers Louw and Itamar Caspi},
Double machine learning (DML) is becoming an increasingly popular tool for automated model selection in high-dimensional settings. At its core, DML assumes unconfoundedness, or exogeneity of all considered controls, which might likely be violated if the covariate space is large. In this paper, we lay out a theory of bad controls building on the graph-theoretic approach to causality. We then demonstrate, based on simulation studies and an application to real-world data, that DML is very… Expand

Figures and Tables from this paper


Double/Debiased Machine Learning for Treatment and Structural Parameters
We revisit the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0. We depart from the classical setting byExpand
Machine Labor
Machine learning (ML) is mostly a predictive enterprise, while the questions of interest to labor economists are mostly causal. In pursuit of causal effects, however, ML may be useful for automatedExpand
A Crash Course in Good and Bad Controls
Many students, especially in econometrics, express frustration with the way a problem known as “bad control” is evaded, if not mishandled, in the traditional literature. The problem arises when theExpand
Inference on Treatment Effects after Selection Amongst High-Dimensional Controls
This work develops a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the "post-double-selection" method, which resolves the problem of uniform inference after model selection for a large, interesting class of models. Expand
Estimating Identifiable Causal Effects through Double Machine Learning
This paper introduces a complete identification algorithm that returns an influence function (IF) for any identifiable causal functional and shows that DML-ID estimators hold the key properties of debiasedness and doubly robustness. Expand
Program evaluation and causal inference with high-dimensional data
This paper shows that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters, and provides results on honest inference for (function-valued) parameters within this general framework where any high-quality, modern machine learning methods can be used to learn the nonparametric/high-dimensional components of the model. Expand
Causal Inference and Data-Fusion in Econometrics
Recent advances in this literature that have the potential to contribute to econometric methodology along three dimensions provide a unified and comprehensive framework for causal inference, in which the aforementioned problems can be addressed in full generality. Expand
The Impact of Machine Learning on Economics
An assessment of the early contributions of machine learning to economics, as well as predictions about its future contributions, and some highlights from the emerging econometric literature combining machine learning and causal inference. Expand
Causal inference and the data-fusion problem
This work addresses the problem of data fusion—piecing together multiple datasets collected under heterogeneous conditions to obtain valid answers to queries of interest and presents a general, nonparametric framework for handling these biases. Expand
Identification, Inference and Sensitivity Analysis for Causal Mediation Effects
Causal mediation analysis is routinely conducted by applied researchers in a variety of disciplines. The goal of such an analysis is to investigate alternative causal mechanisms by examining theExpand