Trustworthy online controlled experiments: five puzzling outcomes explained

@inproceedings{Kohavi2012TrustworthyOC,
  title={Trustworthy online controlled experiments: five puzzling outcomes explained},
  author={Ron Kohavi and Alex Deng and Brian Frasca and Roger Longbotham and Toby Walker and Ya Xu},
  booktitle={KDD},
  year={2012}
}
Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale--thousands of experiments now--has taught us many lessons. These exemplify the… 

Figures from this paper

Trustworthy Online Controlled Experiments
TLDR
This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests, to improve the way they make data-driven decisions.
A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
TLDR
This paper shares twelve common metric interpretation pitfalls, illustrating each pitfall with a puzzling example from a real experiment, and describes processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall.
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
TLDR
This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.
Online controlled experiments at large scale
TLDR
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.
Online Controlled Experiments and A / B Tests
TLDR
Online controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites, especially in combination with Agile software development.
Three Key Checklists and Remedies for Trustworthy Analysis of Online Controlled Experiments at Scale
TLDR
It is revealed that most of the experiment analysis happens before OCEs are even started, and the key analysis steps are summarized in three checklists which can enable novice data scientists and software engineers to become more autonomous in setting-up and analyzing experiments.
Top Challenges from the first Practical Online Controlled Experiments Summit
TLDR
The first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions is provided.
Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization
TLDR
This paper presents some of the experimentation pitfalls that are particularly important for personalization features and aims to increase the experimenters' awareness of leading to improved quality and reliability of the results.
Designing and Analyzing A/B Tests in an Online Marketplace
TLDR
The way is to embrace a regression model for experiment response and study if the interference between test and control make a statistically significant regressor, and advocate for changing the randomization and develop a system in support of that.
Effective Online Controlled Experiment Analysis at Large Scale
TLDR
The standard process of experiment analysis is described, and an artifact is introduced to improve the effectiveness and comprehensiveness of this process.
...
...

References

SHOWING 1-10 OF 57 REFERENCES
Online Experimentation at Microsoft
TLDR
The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.
Controlled experiments on the web: survey and practical guide
TLDR
This work provides a practical guide to conducting online experiments, and shares key lessons that will help practitioners in running trustworthy controlled experiments, including statistical power, sample size, and techniques for variance reduction.
Practical guide to controlled experiments on the web: listen to your customers not to the hippo
TLDR
This work provides a practical guide to conducting online experiments, and shares key lessons that will help practitioners in running trustworthy controlled experiments, including statistical power, sample size, and techniques for variance reduction.
Seven pitfalls to avoid when running controlled experiments on the web
TLDR
The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings).
Overlapping experiment infrastructure: more, better, faster experimentation
TLDR
Google's overlapping experiment infrastructure is described, and the associated tools and educational processes required to use it effectively are discussed, which can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.
Do It Wrong Quickly: How the Web Changes the Old Marketing Rules
"What's the one thing companies care about? Conversion. Getting potential customers to convert into real, actual, customers. But how do you do that in a world of Facebook, Google, YouTube, blogs, and…
Statistics for Experimenters: Design, Innovation and Discovery
TLDR
This introductory textbook continues to teach the philosophy of design and analysis of experiments as well as the β€œnuts and bolts” in a way that is accessible to both students and industrial practitioners and finds clear and well-motivated examples, excellent discussions of underlying statistical concepts and practical guidelines for experimentation.
A / B Testing Using the Negative Binomial Distribution in an Internet Search Application
A/B testing plays an important role in modern appli cations, particularly on the internet, as it helps businesses to optimize their user experience to max imize usage and profits. In this paper we…
Statistics for experimenters : design, innovation, and discovery
Preface to the Second Edition. Chapter 1. Catalizing the Generation of Knowledge. 1.1. The Learning Process. 1.2. Important Considerations. 1.3. The Experimenter's Problem and Statistical Methods.…
Confirmation Bias: A Ubiquitous Phenomenon in Many Guises
Confirmation bias, as the term is typically used in the psychological literature, connotes the seeking or interpreting of evidence in ways that are partial to existing beliefs, expectations, or a…
...
...