An evaluation framework for personalization strategy experiment designs
@article{Liu2020AnEF, title={An evaluation framework for personalization strategy experiment designs}, author={C. H. Bryan Liu and Emma J. McCoy Imperial College London and ASOS.com}, journal={arXiv: Methodology}, year={2020} }
Online Controlled Experiments (OCEs) are the gold standard in evaluating the effectiveness of changes to websites. An important type of OCE evaluates different personalization strategies, which present challenges in low test power and lack of full control in group assignment. We argue that getting the right experiment setup -- the allocation of users to treatment/analysis groups -- should take precedence of post-hoc variance reduction techniques in order to enable the scaling of the number of…
One Citation
Datasets for Online Controlled Experiments
- Computer ScienceNeurIPS Datasets and Benchmarks
- 2021
This work presents the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes.
References
SHOWING 1-10 OF 13 REFERENCES
Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled Experiments
- Computer ScienceKDD
- 2016
This work develops a general framework that is based on evaluation of the mean difference between the actual and the approximated values of the key performance metric and proposes a new class of methods based on advanced machine learning algorithms that have not been applied earlier to the problem of variance reduction.
A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
- Computer ScienceKDD
- 2017
This paper shares twelve common metric interpretation pitfalls, illustrating each pitfall with a puzzling example from a real experiment, and describes processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall.
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
- Computer ScienceWSDM
- 2013
This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.
Designing Experiments to Measure Incrementality on Facebook
- BusinessArXiv
- 2018
This work generalises the statistical significance, power, and required sample size calculation to multi-cell lift studies for Facebook lift studies, and represents the results theoretically in terms of the distributions of test metrics and in practical terms relating to the metrics used by practitioners.
Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments
- Computer ScienceWSDM
- 2015
Instead of estimating trigger treatment effect followed by effect translation using dilution formula, this paper aims at combining these two steps into one streamlined analysis, producing more accurate estimation of overall treatment effect together with even higher statistical power than a triggered analysis.
Validating Bayesian Inference Algorithms with Simulation-Based Calibration
- Computer Science, Biology
- 2018
It is argued that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.
Designing Experiments to Measure
- Incrementality on Facebook
- 2018
What is the Value of Experimentation and Measurement?
- BusinessData Science and Engineering
- 2020
Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until…
Peeking at A/B Tests: Why it matters, and what to do about it
- Computer ScienceKDD
- 2017
This paper provides simulations and numerical studies on Optimizely's data, demonstrating an improvement in detection performance over traditional methods.
Online controlled experiments at large scale
- Computer ScienceKDD
- 2013
This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.