Controlled experiments on the web: survey and practical guide

@article{Kohavi2008ControlledEO,
  title={Controlled experiments on the web: survey and practical guide},
  author={Ron Kohavi and Roger Longbotham and Dan Sommerfield and Randal M. Henne},
  journal={Data Mining and Knowledge Discovery},
  year={2008},
  volume={18},
  pages={140-181}
}
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end… 
Online Controlled Experiments and A / B Tests
TLDR
Online controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites, especially in combination with Agile software development.
Seven pitfalls to avoid when running controlled experiments on the web
TLDR
The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings).
Designing and deploying online field experiments
TLDR
A language for online field experiments called PlanOut separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units.
Online Experimentation at Microsoft
TLDR
The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.
Trustworthy Online Controlled Experiments
TLDR
This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests, to improve the way they make data-driven decisions.
Unexpected results in online controlled experiments
TLDR
This work shares several real examples of unexpected results and lessons learned from online controlled experiments, being used frequently, utilizing software capabilities like ramp-up (exposure control) and running experiments on large server farms with millions of users.
Trustworthy online controlled experiments: five puzzling outcomes explained
TLDR
The topics covered include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects, which should help readers increase the trustworthiness of the results coming out of controlled experiments.
The Rise of the Super Experiment
TLDR
The Super Experiment Framework is introduced, which describes how internet-scale experiments can inform and be informed by classroom and lab experiments, and is applied to a research project implementing learning games for mathematics that is collecting hundreds of thousands of data trials weekly.
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
TLDR
This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.
Online controlled experiments at large scale
TLDR
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.
...
...

References

SHOWING 1-10 OF 59 REFERENCES
Practical guide to controlled experiments on the web: listen to your customers not to the hippo
TLDR
This work provides a practical guide to conducting online experiments, and shares key lessons that will help practitioners in running trustworthy controlled experiments, including statistical power, sample size, and techniques for variance reduction.
Statistics for Experimenters: Design, Innovation and Discovery
TLDR
This introductory textbook continues to teach the philosophy of design and analysis of experiments as well as the “nuts and bolts” in a way that is accessible to both students and industrial practitioners and finds clear and well-motivated examples, excellent discussions of underlying statistical concepts and practical guidelines for experimentation.
Introduction to Design and Analysis : A Student's Handbook
Part 1 Experimenta l design and preliminary data analysis: introduction to experimental design - getting started, how do psychologists conduct research?, experimental research design, summary,
Experimentation Matters: Unlocking the Potential of New Technologies for Innovation
Every company's ability to innovate depends on a process of experimentation whereby new products and services are created and existing ones improved. But the cost of experimentation often limits
How Large Does n Have to be for Z and t Intervals?
Abstract Students invariably ask the question “How large does n have to be for Z and t intervals to give appropriate coverage probabilities?” In this article we review the role of , where (X) is the
Web Site Measurement Hacks
TLDR
By examining how real-world companies use analytics to their success, Web Site Measurement Hacks demonstrates how you, too, can accurately measure your Web site's overall effectiveness.
Ten Supplementary Analyses to Improve E-commerce Web Sites
TLDR
This work describes the construction of a customer signature and the challenges faced by businesses attempting to construct it and offers several recommendations for supplementary analyses that have been found to be very useful in practice.
Discovery of Web Robot Sessions Based on their Navigational Patterns
TLDR
Experimental results on the Computer Science department Web server logs show that highly accurate classification models can be built using the navigational patterns in the click-stream data to determine if it is due to a robot.
Statistical rules of thumb
Preface to the Second Edition. Preface to the First Edition. Acronyms. 1. The Basics. 1.1 Four Basic Questions. 1.2 Observation is Selection. 1.3 Replicate to Characterize Variability. 1.4
Statistical Design and Analysis of Experiments, with Applications to Engineering and Science
Preface. PART I: FUNDAMENTAL STATISTICAL CONCEPTS. Statistics in Engineering and Science. Fundamentals of Statistical Inference. Inferences on Means and Standard Deviations. PART II: DESIGN AND
...
...