• Corpus ID: 235421662

Machine Learning for Variance Reduction in Online Experiments

  title={Machine Learning for Variance Reduction in Online Experiments},
  author={Yongyi Guo and Dominic Coey and Mikael Konutgan and Wenting Li and Ch. P. Schoener and Matt Goldman},
We consider the problem of variance reduction in randomized controlled trials, through the use of covariates correlated with the outcome but independent of the treatment. We propose a machine learning regression-adjusted treatment effect estimator, which we call MLRATE. MLRATE uses machine learning predictors of the outcome to reduce estimator variance. It employs cross-fitting to avoid overfitting biases, and we prove consistency and asymptotic normality under general conditions. MLRATE is… 

Figures and Tables from this paper

Variance Reduction for Experiments with One-Sided Triggering using CUPED

In online experimentation, trigger-dilute analysis is an approach to obtain more precise estimates of intent-to-treat (ITT) effects when the intervention is only exposed, or "triggered", for a small

More Reviews May Not Help: Evidence from Incentivized First Reviews on Airbnb

Online reviews are typically written by volunteers and, as a consequence, information about seller quality may be under-provided in digital marketplaces. We study the extent of this under-provision

Do Incentives to Review Help the Market? Evidence from a Field Experiment on Airbnb

Many online reputation systems operate by asking volunteers to write reviews for free. As a result, a large share of buyers do not review, and those who do review are self-selected. This can cause



High-dimensional regression adjustments in randomized experiments

This work studies the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and shows that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect.

No-harm calibration for generalized Oaxaca-Blinder estimators.

In randomized experiments, linear regression with baseline features can be used to form an estimate of the sample average treatment effect that is asymptotically no less efficient than the

Improving Treatment Effect Estimators Through Experiment Splitting

Using a dataset of 226 Facebook News Feed A/B tests, it is shown that a lasso estimator based on repeated experiment splitting has a 44% lower mean squared predictive error than the conventional, unshrunk treatment effect estimator, and would lead to substantially improved launch decisions over both.

Cross-fitting and fast remainder rates for semiparametric estimation

There are many interesting and widely used estimators of a functional with ?nite semi-parametric variance bound that depend on nonparametric estimators of nuisance func-tions. We use cross-?tting to

Quasi-oracle estimation of heterogeneous treatment effects

This paper develops a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies that have a quasi-oracle property, and implements variants of this approach based on penalized regression, kernel ridge regression, and boosting, and find promising performance relative to existing baselines.

Generalized random forests

A flexible, computationally efficient algorithm for growing generalized random forests, an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest, and an estimator for their asymptotic variance that enables valid confidence intervals are proposed.

Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.

Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique

Freedman [Adv. in Appl. Math. 40 (2008) 180-193; Ann. Appl. Stat. 2 (2008) 176-196] critiqued ordinary least squares regression adjustment of estimated treatment effects in randomized experiments,

Semiparametric theory and empirical processes in causal inference

In this paper we review important aspects of semiparametric theory and empirical processes that arise in causal inference problems. We begin with a brief introduction to the general problem of causal

Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach

Applying the theory of semiparametrics is led naturally to a characterization of all treatment effect estimators and to principled, practically feasible methods for covariate adjustment that yield the desired gains in efficiency and that allow covariate relationships to be identified and exploited while circumventing the usual concerns.