Top Challenges from the first Practical Online Controlled Experiments Summit

@article{Gupta2019TopCF,
  title={Top Challenges from the first Practical Online Controlled Experiments Summit},
  author={Somit Gupta and Ron Kohavi and Diane Tang and Ya Xu},
  journal={SIGKDD Explor.},
  year={2019},
  volume={21},
  pages={20-35}
}
Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale and encourage further academic and industrial exploration, representatives with experience in large-scale experimentation from thirteen different… 
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
This tutorial will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
This tutorial will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
In this tutorial, challenges, best practices, and pitfalls in evaluating experiment results are discussed, focusing on both lessons learned and practical guidelines as well as open research questions.
Trustworthy Online Controlled Experiments
TLDR
This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests, to improve the way they make data-driven decisions.
Online Controlled Experiments at Large Scale in Society 5.0
  • Ron Kohavi
  • Business
    Optimization in Large Scale Problems
  • 2019
TLDR
This chapter shows how online controlled experiments can be run at large scale using A/B tests, which provide trustworthy reliable assessments of the impact of the implementations to key metrics of interest.
How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled Experiments
TLDR
Several scalable methods including user-level performance metric calculation and imputation and matching for missing metric values are introduced, which arise from strong heterogeneity in both mobile devices and user engagement and from self-selection bias caused by post-treatment user engagement changes.
Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective
Practitioners and academics have long appreciated the benefits that experimentation brings to firms. For online web-facing firms, however, it still remains challenging in balancing covariate
Covariance Estimation and its Application in Large-Scale Online Controlled Experiments
TLDR
A novel algorithm for estimating the covariance of online metrics is proposed, which introduces more flexibility to the trade-off between computational costs and precision in covariance estimation, which reduces computational cost of metric calculation in largescale setting.
Software Architecture: 14th European Conference, ECSA 2020, L'Aquila, Italy, September 14–18, 2020, Proceedings
TLDR
This talk discusses some of the challenges for managing AI-based complex and dependable systems, illustrates a case of Cyber-physical systems, and gives some ideas for new research in software engineering inducing software architecture, i.e. for AI engineering.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 71 REFERENCES
Online controlled experiments at large scale
TLDR
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.
Online Controlled Experimentation at Scale: An Empirical Survey on the Current State of A/B Testing
TLDR
The findings show that, among others, companies typically develop in-house experimentation platforms, that these platforms are of various levels of maturity, and that designing key metrics - Overall Evaluation Criteria - remains the key challenge for successful experimentation.
A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
TLDR
This paper shares twelve common metric interpretation pitfalls, illustrating each pitfall with a puzzling example from a real experiment, and describes processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall.
Pitfalls of long-term online controlled experiments
TLDR
Several examples of long-term experiments are shared and cookie stability, survivorship bias, selection bias, and perceived trends are discussed, and methodologies that can be used to partially address some of these issues are shared.
Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned
TLDR
This paper focuses on the topic of how to develop meaningful and useful metrics for online services in their online experiments, and shows how data-driven techniques and criteria can be applied in metric development process.
Three Key Checklists and Remedies for Trustworthy Analysis of Online Controlled Experiments at Scale
TLDR
It is revealed that most of the experiment analysis happens before OCEs are even started, and the key analysis steps are summarized in three checklists which can enable novice data scientists and software engineers to become more autonomous in setting-up and analyzing experiments.
Trustworthy online controlled experiments: five puzzling outcomes explained
TLDR
The topics covered include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects, which should help readers increase the trustworthiness of the results coming out of controlled experiments.
Democratizing online controlled experiments at Booking.com
TLDR
This paper explains how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic, monitoring closely and transparently the quality and the reliability of the data gathering pipelines to build trust in the experimentation infrastructure, and putting in place safeguards to enable anyone to have end to end ownership of their experiments have allowed such a large organization as Booking.com to truly and successfully democratize experimentation.
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
TLDR
This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.
Online Experimentation at Microsoft
TLDR
The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.
...
1
2
3
4
5
...