• Corpus ID: 220363647

Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects

  title={Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects},
  author={Daniela Jacob},
  journal={arXiv: Methodology},
  • D. Jacob
  • Published 6 July 2020
  • Computer Science
  • arXiv: Methodology
We investigate the finite sample performance of sample splitting, cross-fitting and averaging for the estimation of the conditional average treatment effect. Recently proposed methods, so-called meta-learners, make use of machine learning to estimate different nuisance functions and hence allow for fewer restrictions on the underlying structure of the data. To limit a potential overfitting bias that may result when using machine learning methods, cross-fitting estimators have been proposed… 

Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance

The results imply that sample-splitting and cross-fitting are beneficial in large samples for bias reduction and efficiency of the meta-learners, respectively, whereas full-sample estimation is preferable in small samples.

CATE meets ML - Conditional Average Treatment Effect and Machine Learning

For treatment effects - one of the core issues in modern econometric analysis - prediction and estimation are two sides of the same coin. As it turns out, machine learning methods are the tool for

Evaluating sensitivity to classification uncertainty in latent subgroup effect analyses

Increasing attention is being given to assessing treatment effect heterogeneity among individuals belonging to qualitatively different latent subgroups. Inference routinely proceeds by first

Leveraging Population Outcomes to Improve the Generalization of Experimental Results

Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences.

CATE meets ML

This tutorial gives an overview of novel methods, explains them in detail, and applies them via Quantlets in real data applications to study the effect that microcredit availability has on the amount of money borrowed and if 401(k) pension plan eligibility has an impact on net financial assets.

Data Analytics Driven Controlling: Bridging Statistical Modeling and Managerial Intuition

This work proposes a new automatic procedure that allows for a selection of adaptive windows in count data sets by detecting significant changes in the intensity of events and provides guidance for an a-priori selection of fixed windows for forecasting.

Blockchain Mechanism and Distributional Characteristics of Cryptos

This paper provides crypto creators and users with a better understanding toward the connection between the blockchain protocol design and distributional characteristics of cryptos.

Ring the Alarm! Electricity Markets, Renewables, and the Pandemic

  • D. Benatia
  • Economics, Engineering
    SSRN Electronic Journal
  • 2020
The pandemic's impacts on European electricity markets have been enormous, especially in countries with abundant near-zero marginal cost of production like France. This article provides an in-depth

Random Forest Estimation of the Ordered Choice Model

In econometrics so-called ordered choice models are popular when interest is in the estimation of the probabilities of particular values of categorical outcome variables with an inherent ordering,



Double/Debiased Machine Learning for Treatment and Structural Parameters

This work revisits the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0 and proves that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements.

Optimal doubly robust estimation of heterogeneous causal effects

A two-stage doubly robust CATE estimator is studied and a generic model-free error bound is given and it is shown that this estimator can be oracle efficient under even weaker conditions, if used with a specialized form of sample splitting and careful choices of tuning parameters.

Quasi-oracle estimation of heterogeneous treatment effects

This paper develops a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies that have a quasi-oracle property, and implements variants of this approach based on penalized regression, kernel ridge regression, and boosting, and find promising performance relative to existing baselines.

Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence

An Empirical Monte Carlo Study that relies on arguably realistic data generation processes (DGPs) based on actual data to investigate the finite sample performance of causal machine learning estimators for heterogeneous causal effects at different aggregation levels.

Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects

The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively "shrink to homogeneity".

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators

Doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies, however, these approaches may require larger sample sizes to avoid finite-sample issues.

Nonparametric estimation of causal heterogeneity under high-dimensional confounding

This paper considers the practically important case of nonparametrically estimating heterogeneous average treatment effects that vary with a limited number of discrete and continuous covariates in a

Estimation of Conditional Average Treatment Effects With High-Dimensional Data

Abstract Given the unconfoundedness assumption, we propose new nonparametric estimators for the reduced dimensional conditional average treatment effect (CATE) function. In the first stage, the

On Asymptotically Efficient Estimation in Semiparametric Models

On presente une methode generale qui ameliore et modifie la construction de Bickel (1982) des estimateurs adaptatifs