• Corpus ID: 207769626

Bias-aware model selection for machine learning of doubly robust functionals

  title={Bias-aware model selection for machine learning of doubly robust functionals},
  author={Yifan Cui and Eric J. Tchetgen Tchetgen},
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, model selection of possibly high dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a new model selection framework for making inferences about a finite dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function. The class of such doubly robust functionals is quite large… 

Figures from this paper

Model selection for estimation of causal parameters
A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal
Model selection for estimation of causal parameters
A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. This may lead to
On doubly robust inference for double machine learning
Due to concerns about parametric model misspecification, there is interest in using machine learning to adjust for confounding when evaluating the causal effect of an exposure on an outcome.
Double/debiased machine learning for logistic partially linear model
We propose double/debiased machine learning approaches to infer (at the parametric rate) the parametric component of a logistic partially linear model with the binary response following a conditional
Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond
It is proved that under lax rate conditions on nuisances, the estimator has the same favorable asymptotic behavior as the infeasible oracle estimator that solves the estimating equation with the unknown true nuisance functions.
Double-robust and efficient methods for estimating the causal effects of a binary treatment
We consider the problem of estimating the effects of a binary treatment on a continuous outcome of interest from observational data in the absence of confounding by unmeasured factors. We provide a
Semiparametric proximal causal inference
This paper considers the framework of proximal causal inference introduced by Tchetgen Tchet Gen et al. (2020), which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails.
On Semiparametric Instrumental Variable Estimation of Average Treatment Effects through Data Fusion
Suppose one is interested in estimating causal effects in the presence of potentially unmeasured confounding with the aid of a valid instrumental variable. This paper investigates the problem of
Generalized interpretation and identification of separable effects in competing event settings.
A definition of separable effects is proposed that is applicable to general time-varying structures, where the separables effects can still be meaningfully interpreted, even when they cannot be regarded as direct and indirect effects.
On Nearly Assumption-Free Tests of Nominal Confidence Interval Coverage for Causal Parameters Estimated by Machine Learning
For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators ψ^1 are the state-of-the-art, incorporating the good prediction performance of machine learning; the


Data-Adaptive Bias-Reduced Doubly Robust Estimation
An asymptotic linearity theorem is provided which gives the influence function of the proposed doubly robust estimator under correct specification of a parametric nuisance working model for the missingness mechanism/propensity score but a possibly misspecified (finite- or infinite-dimensional) outcome working model.
High-dimensional doubly robust tests for regression parameters
This work proposes tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified.
A unifying approach for doubly-robust $\ell_1$ regularized estimation of causal contrasts
We consider inference about a scalar parameter under a non-parametric model based on a one-step estimator computed as a plug in estimator plus the empirical mean of an estimator of the parameter's
Bias-Reduced Doubly Robust Estimation
Over the past decade, doubly robust estimators have been proposed for a variety of target parameters in causal inference and missing data models. These are asymptotically unbiased when at least one
Double/Debiased Machine Learning for Treatment and Structural Parameters
This work revisits the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0 and proves that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements.
Demystifying a class of multiply robust estimators
For estimating the population mean of a response variable subject to ignorable missingness, a new class of methods, called multiply robust procedures, has been proposed. The advantage of multiply
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Minimax estimation of a functional on a structured high-dimensional model
We introduce a new method of estimation of parameters in semi-parametric and nonparametric models. The method is based on estimating equations that are U-statistics in the observations. The
On Model Selection Consistency of Lasso
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.
Estimation of Regression Coefficients When Some Regressors are not Always Observed
Abstract In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either