• Corpus ID: 219721046

Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes

  title={Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes},
  author={Susan Athey and Raj Chetty and Guido Imbens},
  journal={arXiv: Methodology},
There has been an increase in interest in experimental evaluations to estimate causal effects, partly because their internal validity tends to be high. At the same time, as part of the big data revolution, large, detailed, and representative, administrative data sets have become more widely available. However, the credibility of estimates of causal effects based on such data sets alone can be low. In this paper, we develop statistical methods for systematically combining experimental and… 

Figures and Tables from this paper

A Simple Estimator for Estimating Treatment Effects Using Observational Data and Experimental Data

When estimating treatment effects, the golden standard is to conduct a randomized experiment and then contrast outcomes associated with the treatment group and the control group. However, in many

Causal inference methods for combining randomized trials and observational studies: a review

This paper first discusses identification and estimation methods that improve generalizability of randomized controlled trials (RCTs) using the representativeness of observational data, and methods that combining RCTs and observational data to improve the (conditional) average treatment effect estimation.

Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects

We consider the task of identifying and estimating the causal effect of a treatment variable on a long-term outcome variable using data from an observational domain and an experimental domain. The

Combining Observational and Experimental Data Using First-stage Covariates

A method is proposed that combines experimental and observational datasets when units from these two datasets are sampled from the same population and some characteristics of these units are observed, and it is shown that these characteristics can partially explain treatment assignment in the observational data.

Adaptive Combination of Conditional Average Treatment Effects Based on Randomized and Observational Data

Data from both a randomized trial and an observational study are sometimes simultaneously available for evaluating the effect of an intervention. The randomized data typically allows for reliable

Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects

This paper proposes to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data via representation learning via CorNet, and introduces a sample-eficient algorithm, called CorNet.

Combining Interventional and Observational Data Using Causal Reductions

A causal reduction method is proposed that replaces an arbitrary number of possibly highdimensional latent confounders with a single latent confoundinger that takes values in the same space as the treatment variable, without changing the observational and interventional distributions the causal model entails.

Long-term Causal Inference Under Persistent Confounding via Data Combination

We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay,

Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data

A framework is presented that allows one to employ machine learning algorithms to learn from the observational data, and use the resulting models to improve precision in randomized experiments, and there is no requirement that the machine learning models contributed equally.

Combining Experimental and Observational Studies in Meta-Analysis: A Mutual Debiasing Approach∗

We propose a method for aggregating evidence from observational studies, which may be subject to internal selection bias, and randomized controlled trials (RCTs) which may be subject to site



Propensity score methods for merging observational and experimental datasets

It is found that a version of the spiked-in estimator yields lower-MSE estimates of the causal impact of HT on coronary heart disease than would be achieved using either a small RCT or the observational component on its own.

Combining Observational and Experimental Datasets Using Shrinkage Estimators

This work proposes a generic procedure for deriving shrinkage estimators in this setting, making use of a generalized unbiased risk estimate, and develops two new estimators that prove finite sample conditions under which they have lower risk than an estimator using only experimental data, and show that each achieves a notion of asymptotic optimality.

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data

This work derives the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings and proposes ATE estimators and inferential methods based on flexible machine learning methods to estimate nuisance parameters that appear in the influence functions.

Removing Hidden Confounding by Experimental Grounding

This work introduces a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data.

Estimating causal effects of treatments in randomized and nonrandomized studies.

A discussion of matching, randomization, random sampling, and other methods of controlling extraneous variation is presented. The objective is to specify the benefits of randomization in estimating

Identification and Extrapolation of Causal Effects with Instrumental Variables

Instrumental variables (IV) are widely used in economics to address selection on unobservables. Standard IV methods produce estimates of causal effects that are specific to individuals whose behavior

The central role of the propensity score in observational studies for causal effects

Abstract : The results of observational studies are often disputed because of nonrandom treatment assignment. For example, patients at greater risk may be overrepresented in some treatment group.

The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely

A common challenge in estimating the long-term impacts of treatments (e.g., job training programs) is that the outcomes of interest (e.g., lifetime earnings) are observed with a long delay. We

The Role of the Propensity Score in Estimating Dose-Response Functions

Estimation of average treatment effects in observational, or non-experimental in pre-treatment variables. If the number of pre-treatment variables is large, and their distribution varies