• Corpus ID: 235421662

Machine Learning for Variance Reduction in Online Experiments

  title={Machine Learning for Variance Reduction in Online Experiments},
  author={Yongyi Guo and Dominic Coey and Mikael Konutgan and Wenting Li and Ch. P. Schoener and Matt Goldman},
We consider the problem of variance reduction in randomized controlled trials, through the use of covariates correlated with the outcome but independent of the treatment. We propose a machine learning regression-adjusted treatment effect estimator, which we call MLRATE. MLRATE uses machine learning predictors of the outcome to reduce estimator variance. It employs cross-fitting to avoid overfitting biases, and we prove consistency and asymptotic normality under general conditions. MLRATE is… 

Figures and Tables from this paper

Variance Reduction for Experiments with One-Sided Triggering using CUPED

In online experimentation, trigger-dilute analysis is an approach to obtain more precise estimates of intent-to-treat (ITT) effects when the intervention is only exposed, or "triggered", for a small

Adaptive A/B Tests and Simultaneous Treatment Parameter Optimization

Constructing asymptotically valid confidence intervals through a valid central limit theorem is crucial for A/B tests, where a classical goal is to statistically assert whether a treatment plan is

When Less is More: Using Short-Term Signals to Overcome Systematic Bias in Long-Run Targeting

Firms are increasingly interested in developing targeted interventions for customers with the best response. Doing so requires firms to identify differences in customer sensitivity, which they often

Do Incentives to Review Help the Market? Evidence from a Field Experiment on Airbnb

Many online reputation systems operate by asking volunteers to write reviews for free. As a result, a large share of buyers do not review, and those who do review are self-selected. This can cause

More Reviews May Not Help: Evidence from Incentivized First Reviews on Airbnb

Online reviews are typically written by volunteers and, as a consequence, information about seller quality may be under-provided in digital marketplaces. We study the extent of this under-provision



High-dimensional regression adjustments in randomized experiments

This work studies the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and shows that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect.

Double/Debiased Machine Learning for Treatment and Structural Parameters

This work revisits the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0 and proves that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements.

Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain

A fully data-driven method for choosing the user-specified penalty that must be provided in obtaining LASSO and Post-LASSO estimates is provided and its asymptotic validity under non-Gaussian, heteroscedastic disturbances is established.

No-harm calibration for generalized Oaxaca-Blinder estimators.

In randomized experiments, linear regression with baseline features can be used to form an estimate of the sample average treatment effect that is asymptotically no less efficient than the

Improving Treatment Effect Estimators Through Experiment Splitting

Using a dataset of 226 Facebook News Feed A/B tests, it is shown that a lasso estimator based on repeated experiment splitting has a 44% lower mean squared predictive error than the conventional, unshrunk treatment effect estimator, and would lead to substantially improved launch decisions over both.

Cross-fitting and fast remainder rates for semiparametric estimation

There are many interesting and widely used estimators of a functional with ?nite semi-parametric variance bound that depend on nonparametric estimators of nuisance func-tions. We use cross-?tting to

Quasi-oracle estimation of heterogeneous treatment effects

This paper develops a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies that have a quasi-oracle property, and implements variants of this approach based on penalized regression, kernel ridge regression, and boosting, and find promising performance relative to existing baselines.

Generalized random forests

A flexible, computationally efficient algorithm for growing generalized random forests, an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest, and an estimator for their asymptotic variance that enables valid confidence intervals are proposed.

Efficiency Study of Estimators for a Treatment Effect in a Pretest–Posttest Trial

Several possible methods used to evaluate treatment effects in a randomized pretest–posttest trial with two treatment groups are the two-sample t test, the paired t test, analysis of covariance I

Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.