Top Challenges from the first Practical Online Controlled Experiments Summit
@article{Gupta2019TopCF, title={Top Challenges from the first Practical Online Controlled Experiments Summit}, author={Somit Gupta and Ron Kohavi and Diane Tang and Ya Xu}, journal={SIGKDD Explor.}, year={2019}, volume={21}, pages={20-35} }
Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale and encourage further academic and industrial exploration, representatives with experience in large-scale experimentation from thirteen different…Â
63 Citations
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
- Computer ScienceWWW
- 2020
This tutorial will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
- Computer ScienceWSDM
- 2020
This tutorial will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
- Computer ScienceKDD
- 2019
In this tutorial, challenges, best practices, and pitfalls in evaluating experiment results are discussed, focusing on both lessons learned and practical guidelines as well as open research questions.
Trustworthy Online Controlled Experiments
- Computer Science
- 2020
This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests, to improve the way they make data-driven decisions.
Online Controlled Experiments at Large Scale in Society 5.0
- BusinessOptimization in Large Scale Problems
- 2019
This chapter shows how online controlled experiments can be run at large scale using A/B tests, which provide trustworthy reliable assessments of the impact of the implementations to key metrics of interest.
How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled Experiments
- Computer ScienceWSDM
- 2021
Several scalable methods including user-level performance metric calculation and imputation and matching for missing metric values are introduced, which arise from strong heterogeneity in both mobile devices and user engagement and from self-selection bias caused by post-treatment user engagement changes.
No evidence of attraction effect among recommended options: A large-scale field experiment on an online flight aggregator
- BusinessDecis. Support Syst.
- 2022
Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective
- Mathematics
- 2022
Practitioners and academics have long appreciated the benefits that experimentation brings to firms. For online web-facing firms, however, it still remains challenging in balancing covariate…
Covariance Estimation and its Application in Large-Scale Online Controlled Experiments
- Computer Science
- 2021
A novel algorithm for estimating the covariance of online metrics is proposed, which introduces more flexibility to the trade-off between computational costs and precision in covariance estimation, which reduces computational cost of metric calculation in largescale setting.
Software Architecture: 14th European Conference, ECSA 2020, L'Aquila, Italy, September 14–18, 2020, Proceedings
- Computer ScienceECSA
- 2020
This talk discusses some of the challenges for managing AI-based complex and dependable systems, illustrates a case of Cyber-physical systems, and gives some ideas for new research in software engineering inducing software architecture, i.e. for AI engineering.
References
SHOWING 1-10 OF 71 REFERENCES
Online controlled experiments at large scale
- Computer ScienceKDD
- 2013
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.
Online Controlled Experimentation at Scale: An Empirical Survey on the Current State of A/B Testing
- Computer Science, Business2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)
- 2018
The findings show that, among others, companies typically develop in-house experimentation platforms, that these platforms are of various levels of maturity, and that designing key metrics - Overall Evaluation Criteria - remains the key challenge for successful experimentation.
A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
- Computer ScienceKDD
- 2017
This paper shares twelve common metric interpretation pitfalls, illustrating each pitfall with a puzzling example from a real experiment, and describes processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall.
Pitfalls of long-term online controlled experiments
- Business2016 IEEE International Conference on Big Data (Big Data)
- 2016
Several examples of long-term experiments are shared and cookie stability, survivorship bias, selection bias, and perceived trends are discussed, and methodologies that can be used to partially address some of these issues are shared.
Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned
- Computer ScienceKDD
- 2016
This paper focuses on the topic of how to develop meaningful and useful metrics for online services in their online experiments, and shows how data-driven techniques and criteria can be applied in metric development process.
Three Key Checklists and Remedies for Trustworthy Analysis of Online Controlled Experiments at Scale
- Computer Science2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
- 2019
It is revealed that most of the experiment analysis happens before OCEs are even started, and the key analysis steps are summarized in three checklists which can enable novice data scientists and software engineers to become more autonomous in setting-up and analyzing experiments.
Trustworthy online controlled experiments: five puzzling outcomes explained
- Computer ScienceKDD
- 2012
The topics covered include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects, which should help readers increase the trustworthiness of the results coming out of controlled experiments.
Democratizing online controlled experiments at Booking.com
- Computer ScienceArXiv
- 2017
This paper explains how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic, monitoring closely and transparently the quality and the reliability of the data gathering pipelines to build trust in the experimentation infrastructure, and putting in place safeguards to enable anyone to have end to end ownership of their experiments have allowed such a large organization as Booking.com to truly and successfully democratize experimentation.
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
- Computer ScienceWSDM '13
- 2013
This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.
Online Experimentation at Microsoft
- Computer Science
- 2009
The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.