• Corpus ID: 220713235

An evaluation framework for personalization strategy experiment designs

@article{Liu2020AnEF,
  title={An evaluation framework for personalization strategy experiment designs},
  author={C. H. Bryan Liu and Emma J. McCoy Imperial College London and ASOS.com},
  journal={arXiv: Methodology},
  year={2020}
}
Online Controlled Experiments (OCEs) are the gold standard in evaluating the effectiveness of changes to websites. An important type of OCE evaluates different personalization strategies, which present challenges in low test power and lack of full control in group assignment. We argue that getting the right experiment setup -- the allocation of users to treatment/analysis groups -- should take precedence of post-hoc variance reduction techniques in order to enable the scaling of the number of… 

Figures and Tables from this paper

Datasets for Online Controlled Experiments

This work presents the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes.

References

SHOWING 1-10 OF 13 REFERENCES

Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled Experiments

This work develops a general framework that is based on evaluation of the mean difference between the actual and the approximated values of the key performance metric and proposes a new class of methods based on advanced machine learning algorithms that have not been applied earlier to the problem of variance reduction.

A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments

This paper shares twelve common metric interpretation pitfalls, illustrating each pitfall with a puzzling example from a real experiment, and describes processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall.

Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

This work proposes an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity in experiments, applicable to a wide variety of key business metrics.

Designing Experiments to Measure Incrementality on Facebook

This work generalises the statistical significance, power, and required sample size calculation to multi-cell lift studies for Facebook lift studies, and represents the results theoretically in terms of the distributions of test metrics and in practical terms relating to the metrics used by practitioners.

Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments

Instead of estimating trigger treatment effect followed by effect translation using dilution formula, this paper aims at combining these two steps into one streamlined analysis, producing more accurate estimation of overall treatment effect together with even higher statistical power than a triggered analysis.

Validating Bayesian Inference Algorithms with Simulation-Based Calibration

It is argued that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.

Designing Experiments to Measure

  • Incrementality on Facebook
  • 2018

What is the Value of Experimentation and Measurement?

  • C. H. Bryan LiuB. ChamberlainEmma J. McCoy
  • Business
    Data Science and Engineering
  • 2020
Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until

Peeking at A/B Tests: Why it matters, and what to do about it

This paper provides simulations and numerical studies on Optimizely's data, demonstrating an improvement in detection performance over traditional methods.

Online controlled experiments at large scale

This work discusses why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits, and designs a highly scalable system able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users.