# Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

@article{Cheng2018EfficientAR, title={Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data}, author={David Cheng and Ashwin N. Ananthakrishnan and Tianxi Cai}, journal={arXiv: Methodology}, year={2018} }

There is strong interest in conducting comparative effectiveness research (CER) in electronic medical records (EMR) to evaluate treatment strategies among real-world patients. Inferring causal effects in EMR data, however, is challenging due to the lack of direct observation on pre-specified gold-standard outcomes, in addition to the observational nature of the data. Extracting gold-standard outcomes often requires labor-intensive medical chart review, which is unfeasible for large studies… Expand

#### 4 Citations

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data

- Computer Science, Mathematics
- ArXiv
- 2020

This work derives the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings and proposes ATE estimators and inferential methods based on flexible machine learning methods to estimate nuisance parameters that appear in the influence functions. Expand

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

- Mathematics
- 2021

Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors . In this paper, we develop a surrogate… Expand

High-dimensional semi-supervised learning: in search of optimal inference of the mean

- Mathematics
- Biometrika
- 2021

A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit… Expand

Optimal Sampling for Generalized Linear Models Under Measurement Constraints

- Mathematics
- 2019

Abstract Under “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our… Expand

#### References

SHOWING 1-10 OF 34 REFERENCES

Estimating Average Treatment Effects with a Response-Informed Calibrated Propensity Score

- Mathematics
- 2017

Approaches based on propensity score (PS) modeling are often used to estimate causal treatment effects in observational studies. The performance of inverse probability weighting (IPW) and… Expand

Estimating average treatment effects with a double-index propensity score.

- Mathematics, Medicine
- Biometrics
- 2019

A novel PS estimator, the Double-index Propensity Score (DiPS), is proposed, in which the treatment status is smoothed over the linear predictors for X from both the initial working models, which leads to gains in efficiency and robustness over traditional doubly-robust estimators. Expand

Miscellanea. A robust imputation method for surrogate outcome data

- Mathematics
- 2000

SUMMARY We consider estimation for regression analysis with surrogate or auxiliary outcome data. Assume that the regression model for the conditional mean of the outcome is a known function of a… Expand

Information Recovery in a Study With Surrogate Endpoints

- Mathematics
- 2003

Recently, there has been a lot of interest in statistical methods for analyzing data with surrogate endpoints. In this article, we consider parameter estimation from a model that relates a variable Y… Expand

Causal inference with missing exposure information: Methods and applications to an obstetric study

- Mathematics, Medicine
- Statistical methods in medical research
- 2016

This article describes and compares a collection of methods based on different modeling assumptions, under standard assumptions for missing data and for causal inference with complete data, that are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortiumon Safe Labor. Expand

Inference using surrogate outcome data and a validation sample

- Mathematics
- 1992

SUMMARY In the context of estimating ,3 from the regression model P ( YI X), relating response Y to covariates X, suppose that only a surrogate response S is available for most study subjects.… Expand

A mean score method for missing and auxiliary covariate data in regression models

- Mathematics
- 1995

SUMMARY We consider regression analysis when incomplete or auxiliary covariate data are available for all study subjects and, in addition, for a subset called the validation sample, true covariate… Expand

Doubly robust estimators of causal exposure effects with missing data in the outcome, exposure or a confounder.

- Mathematics, Medicine
- Statistics in medicine
- 2012

We consider the estimation of the causal effect of a binary exposure on a continuous outcome. Confounding and missing data are both likely to occur in practice when observational data are used to… Expand

Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data.

- Psychology, Medicine
- Statistical science : a review journal of the Institute of Mathematical Statistics
- 2005

It is shown how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class, and how the theoretical results translate into practice. Expand

Robust Model-Based Inference for Incomplete Data via Penalized Spline Propensity Prediction

- Mathematics, Computer Science
- Commun. Stat. Simul. Comput.
- 2008

Three approaches to standard errors estimation incorporating the uncertainty due to non response are considered: standard errors based on the asymptotic variance of the PSPP estimator, ignoring sampling error in estimating the response propensity; standard errorsbased on the bootstrap method; and multiple imputation-based standard errors using draws from the joint posterior predictive distribution of missing values under the PSPp model. Expand