Corpus ID: 88517061

Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

  title={Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data},
  author={David Cheng and Ashwin N. Ananthakrishnan and Tianxi Cai},
  journal={arXiv: Methodology},
There is strong interest in conducting comparative effectiveness research (CER) in electronic medical records (EMR) to evaluate treatment strategies among real-world patients. Inferring causal effects in EMR data, however, is challenging due to the lack of direct observation on pre-specified gold-standard outcomes, in addition to the observational nature of the data. Extracting gold-standard outcomes often requires labor-intensive medical chart review, which is unfeasible for large studies… Expand

Figures and Tables from this paper

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data
This work derives the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings and proposes ATE estimators and inferential methods based on flexible machine learning methods to estimate nuisance parameters that appear in the influence functions. Expand
Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction
Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors . In this paper, we develop a surrogateExpand
High-dimensional semi-supervised learning: in search of optimal inference of the mean
A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicitExpand
Optimal Sampling for Generalized Linear Models Under Measurement Constraints
Abstract Under “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. OurExpand


Estimating Average Treatment Effects with a Response-Informed Calibrated Propensity Score
Approaches based on propensity score (PS) modeling are often used to estimate causal treatment effects in observational studies. The performance of inverse probability weighting (IPW) andExpand
Estimating average treatment effects with a double-index propensity score.
A novel PS estimator, the Double-index Propensity Score (DiPS), is proposed, in which the treatment status is smoothed over the linear predictors for X from both the initial working models, which leads to gains in efficiency and robustness over traditional doubly-robust estimators. Expand
Miscellanea. A robust imputation method for surrogate outcome data
SUMMARY We consider estimation for regression analysis with surrogate or auxiliary outcome data. Assume that the regression model for the conditional mean of the outcome is a known function of aExpand
Information Recovery in a Study With Surrogate Endpoints
Recently, there has been a lot of interest in statistical methods for analyzing data with surrogate endpoints. In this article, we consider parameter estimation from a model that relates a variable YExpand
Causal inference with missing exposure information: Methods and applications to an obstetric study
This article describes and compares a collection of methods based on different modeling assumptions, under standard assumptions for missing data and for causal inference with complete data, that are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortiumon Safe Labor. Expand
Inference using surrogate outcome data and a validation sample
SUMMARY In the context of estimating ,3 from the regression model P ( YI X), relating response Y to covariates X, suppose that only a surrogate response S is available for most study subjects.Expand
A mean score method for missing and auxiliary covariate data in regression models
SUMMARY We consider regression analysis when incomplete or auxiliary covariate data are available for all study subjects and, in addition, for a subset called the validation sample, true covariateExpand
Doubly robust estimators of causal exposure effects with missing data in the outcome, exposure or a confounder.
We consider the estimation of the causal effect of a binary exposure on a continuous outcome. Confounding and missing data are both likely to occur in practice when observational data are used toExpand
Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data.
It is shown how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class, and how the theoretical results translate into practice. Expand
Robust Model-Based Inference for Incomplete Data via Penalized Spline Propensity Prediction
  • H. An, R. Little
  • Mathematics, Computer Science
  • Commun. Stat. Simul. Comput.
  • 2008
Three approaches to standard errors estimation incorporating the uncertainty due to non response are considered: standard errors based on the asymptotic variance of the PSPP estimator, ignoring sampling error in estimating the response propensity; standard errorsbased on the bootstrap method; and multiple imputation-based standard errors using draws from the joint posterior predictive distribution of missing values under the PSPp model. Expand