From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

@article{Xu2015FromIT,
  title={From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks},
  author={Ya Xu and Nanyu Chen and Addrian Fernandez and Omar Sinno and Anmol Bhasin},
  journal={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2015}
}
  • Ya Xu, Nanyu Chen, Anmol Bhasin
  • Published 10 August 2015
  • Computer Science
  • Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. [] Key Method We start with an introduction of the experimentation platform and how it is built to handle each step of the A/B testing process at LinkedIn, from designing and deploying experiments to analyzing them. It is then followed by discussions on several more sophisticated A/B testing scenarios, such as running…

Figures from this paper

Scalable Online Survey Framework: from Sampling to Analysis
TLDR
This paper starts with discussions on how to handle multiple email surveys under such constraints, then shifts to challenges of in-product surveys and how to address them at LinkedIn through a survey study conducted across two mobile apps.
D-Optimal Design for Network A/B Testing
TLDR
This paper proposes to use the conditional auto-regressive model to present the network structure and include the network effects in the estimation and inference of the treatment effect, and develops a D-optimal design criterion based on the proposed model.
Using Ego-Clusters to Measure Network Effects at LinkedIn
TLDR
This paper outlines a simple and scalable solution to measuring network effects, using ego-network randomization, where a cluster is comprised of an "ego" (a focal individual), and her "alters" (the individuals she is immediately connected to).
Statistical Designs for Network A/B Testing
TLDR
Several algorithms that describe the procedure, test the performance of the covariate assisted Bayesian model on synthetic and real-world networks and compare the results to a Bayesian sequential model that does not use network covariates in its posterior updates.
Network Experimentation at Scale
TLDR
This work describes the network experimentation framework, deployed at Facebook, which accounts for interference between experimental units, and introduces a cluster-based regression adjustment that substantially improves precision for estimating global treatment effects and a procedure to test for interference.
Top Challenges from the first Practical Online Controlled Experiments Summit
TLDR
The first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions is provided.
Democratizing online controlled experiments at Booking.com
TLDR
This paper explains how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic, monitoring closely and transparently the quality and the reliability of the data gathering pipelines to build trust in the experimentation infrastructure, and putting in place safeguards to enable anyone to have end to end ownership of their experiments have allowed such a large organization as Booking.com to truly and successfully democratize experimentation.
Designing and Analyzing A/B Tests in an Online Marketplace
TLDR
The way is to embrace a regression model for experiment response and study if the interference between test and control make a statistically significant regressor, and advocate for changing the randomization and develop a system in support of that.
A/B Testing with APONE
TLDR
The developed and open sourced APONE, an A cademic P latform for ON line Experiments, uses PlanOut, a framework and high-level language, to specify online experiments, and offers Web services and a Web GUI to easily create, manage and monitor them.
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
TLDR
In this tutorial, challenges, best practices, and pitfalls in evaluating experiment results are discussed, focusing on both lessons learned and practical guidelines as well as open research questions.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Network A/B Testing: From Sampling to Estimation
TLDR
The existence of network effect is examined in a recent online experiment conducted at LinkedIn, an efficient and effective estimator for Average Treatment Effect (ATE) considering the interference between users in real online experiments is proposed, and the method is applied in both simulations and a real world online experiment.
Online controlled experiments at large scale
TLDR
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.
Graph cluster randomization: network exposure to multiple universes
TLDR
It is shown that proper cluster randomization can lead to exponentially lower estimator variance when experimentally measuring average treatment effects under interference, and if a graph satisfies a restricted-growth condition on the growth rate of neighborhoods, then there exists a natural clustering algorithm, based on vertex neighborhoods, for which the variance of the estimator can be upper bounded by a linear function of the degrees.
Framework and algorithms for network bucket testing
TLDR
A simple general framework for designing and evaluating sampling techniques for network bucket testing is introduced and several simple sampling algorithms that are evaluated using both synthetic and real social networks are presented.
Overlapping experiment infrastructure: more, better, faster experimentation
TLDR
Google's overlapping experiment infrastructure is described, and the associated tools and educational processes required to use it effectively are discussed, which can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.
Data Infrastructure at LinkedIn
TLDR
A few selected data infrastructure projects at Linked In are described that have helped the company accommodate this increasing scale of core data sets and request processing requirements.
Trustworthy online controlled experiments: five puzzling outcomes explained
TLDR
The topics covered include: the OEC (Overall Evaluation Criterion), click tracking, effect trends, experiment length and power, and carryover effects, which should help readers increase the trustworthiness of the results coming out of controlled experiments.
Seven rules of thumb for web site experimenters
TLDR
Seven rules of thumb for experimenters are shared that have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known.
Seven pitfalls to avoid when running controlled experiments on the web
TLDR
The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings).
Controlled experiments on the web: survey and practical guide
TLDR
This work provides a practical guide to conducting online experiments, and shares key lessons that will help practitioners in running trustworthy controlled experiments, including statistical power, sample size, and techniques for variance reduction.
...
1
2
3
4
5
...