Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

@inproceedings{Deng2013ImprovingTS,
  title={Improving the sensitivity of online controlled experiments by utilizing pre-experiment data},
  author={Alex Deng and Ya Xu and Ron Kohavi and Toby Walker},
  booktitle={WSDM '13},
  year={2013}
}
Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small differences in key metrics, on the order of fractions of a percent, may have very significant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments… 

Figures from this paper

Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix
TLDR
This paper describes an innovative implementation of stratified sampling at Netflix where users are assigned to experiments in real time and discusses some surprising challenges with the implementation, and recommends to use post-assigned variance reduction techniques such as post stratification and CUPED instead of at-assignment variance reduction Techniques such as stratified sampled in large-scale controlled experiments.
Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled Experiments
TLDR
This work develops a general framework that is based on evaluation of the mean difference between the actual and the approximated values of the key performance metric and proposes a new class of methods based on advanced machine learning algorithms that have not been applied earlier to the problem of variance reduction.
Online controlled experiments at large scale
TLDR
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.
Bias Variance Tradeoff in Analysis of Online Controlled Experiments
TLDR
This paper examines two common approaches for analyzing usage data collected from users within the time window of an experiment, which can differ in accuracy and power.
Pitfalls of long-term online controlled experiments
TLDR
Several examples of long-term experiments are shared and cookie stability, survivorship bias, selection bias, and perceived trends are discussed, and methodologies that can be used to partially address some of these issues are shared.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
In this tutorial, challenges, best practices, and pitfalls in evaluating experiment results are discussed, focusing on both lessons learned and practical guidelines as well as open research questions.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
This tutorial will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
TLDR
This paper shares twelve common metric interpretation pitfalls, illustrating each pitfall with a puzzling example from a real experiment, and describes processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
This tutorial will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
Designing and Analyzing A/B Tests in an Online Marketplace
TLDR
The way is to embrace a regression model for experiment response and study if the interference between test and control make a statistically significant regressor, and advocate for changing the randomization and develop a system in support of that.
...
...

References

SHOWING 1-10 OF 23 REFERENCES
Trustworthy online controlled experiments: five puzzling outcomes explained
TLDR
The topics covered include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects, which should help readers increase the trustworthiness of the results coming out of controlled experiments.
Controlled experiments on the web: survey and practical guide
TLDR
This work provides a practical guide to conducting online experiments, and shares key lessons that will help practitioners in running trustworthy controlled experiments, including statistical power, sample size, and techniques for variance reduction.
Online Experimentation at Microsoft
TLDR
The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.
Choice of the Randomization Unit in Online Controlled Experiment
TLDR
This paper compares the two experiment units and provides a method to correctly analyze a page view randomization experiment in a two layer randomization framework.
Overlapping experiment infrastructure: more, better, faster experimentation
TLDR
Google's overlapping experiment infrastructure is described, and the associated tools and educational processes required to use it effectively are discussed, which can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.
Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data.
TLDR
It is shown how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class, and how the theoretical results translate into practice.
Large-scale validation and analysis of interleaved search evaluation
TLDR
This paper provides a comprehensive analysis of interleaving using data from two major commercial search engines and a retrieval system for scientific literature, and analyzes the agreement ofinterleaving with manual relevance judgments and observational implicit feedback measures.
Statistics for Experimenters: Design, Innovation and Discovery
TLDR
This introductory textbook continues to teach the philosophy of design and analysis of experiments as well as the “nuts and bolts” in a way that is accessible to both students and industrial practitioners and finds clear and well-motivated examples, excellent discussions of underlying statistical concepts and practical guidelines for experimentation.
Stochastic simulation
  • B. Ripley
  • Computer Science
    Wiley series in probability and mathematical statistics : applied probability and statistics
  • 1987
TLDR
Brian D. Ripley's Stochastic Simulation is a short, yet ambitious, survey of modern simulation techniques, and three themes run throughout the book.
Introduction to Design and Analysis : A Student's Handbook
Part 1 Experimenta l design and preliminary data analysis: introduction to experimental design - getting started, how do psychologists conduct research?, experimental research design, summary,
...
...