• Corpus ID: 52919411

Contextual Multi-Armed Bandits for Causal Marketing

  title={Contextual Multi-Armed Bandits for Causal Marketing},
  author={Neela Sawant and Chitti Babu Namballa and Narayanan Sadagopan and Houssam Nassif},
This work explores the idea of a causal contextual multi-armed bandit approach to automated marketing, where we estimate and optimize the causal (incremental) effects. Focusing on causal effect leads to better return on investment (ROI) by targeting only the persuadable customers who wouldn't have taken the action organically. Our approach draws on strengths of causal inference, uplift modeling, and multi-armed bandits. It optimizes on causal treatment effects rather than pure outcome, and… 

Figures from this paper

Uplifting Bandits
A multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them, and is motivated by marketing campaigns and recommender systems is introduced.
AutoML for Contextual Bandits
This work proposes an end to end automated meta-learning pipeline to approximate the optimal Q function for contextual bandits problems and sees that the model is able to perform much better than random exploration, being more regret efficient and able to converge with a limited number of samples, while remaining very general and easy to use due to the meta- learning approach.
Online Inference for Advertising Auctions
Simulations show that not only the proposed method successfully accomplishes the advertiser's goals, but also does so at a much lower cost than more conventional experimentation policies aimed at performing causal inference.
Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits
Improved fixed-design confidence bounds for the linear logistic model improve upon the state-of-the-art bound by Li et al. (2017) and provide a lower bound highlighting a dependence on 1/κ for a family of instances.
An Experimental Design Approach for Regret Minimization in Logistic Bandits
This work improves upon the regret minimization of logistic bandits in the fixed arm setting by employing an experimental design procedure that achieves a minimax regret of O (√ d ˙ µT log (∣X∣)) .
Crowd Learning: Improving Online Decision Making Using Crowdsourced Data
We analyze an online learning problem that arises in crowdsourcing systems for users facing crowd-sourced data: a user at each discrete time step t can choose K out of a total of N options (bandits),
Decoupling Learning Rates Using Empirical Bayes Priors
This work proposes an Empirical Bayes approach to decouple the learning rates of first order and second order features in a Generalized Linear Model, and applies this method to a standard classification setting, as well as a contextual bandit setting in an Amazon production system.
Encrypted Linear Contextual Bandit
This paper introduces a privacy-preserving bandit framework based on homomorphic encryption which allows computations using encrypted data and shows that despite the complexity of the setting, it is possible to solve linear contextual bandits over encrypted data with a regret bound in any linear contextual bandit problem, while keeping data encrypted.
Bayesian Meta-Prior Learning Using Empirical Bayes
The empirical Bayes method clamps features in each group together and uses the deployed model’s observed data to empirically compute a hierarchical prior in hindsight, and reports theoretical results for the unbiasedness, strong consistency, and optimal frequentist cumulative regret properties of the meta-prior variance estimator.
Uplift Modeling for Multiple Treatments with Cost Optimization
  • Zhenyu Zhao, Totte Harinen
  • Computer Science
    2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
  • 2019
This paper extends standard uplift models to support multiple treatment groups with different costs using both synthetic and real data and describes a production implementation of the approach.


Bandits with Unobserved Confounders: A Causal Approach
It is shown that to achieve low regret in certain realistic classes of bandit problems (namely, in the face of unobserved confounders), both experimental and observational quantities are required by the rational agent.
Estimating the Causal Impact of Recommendation Systems from Observational Data
This paper shows that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not, and applies a method for estimating causal effects from purely observational data to browsing logs containing anonymized activity on Amazon.com.
Thompson Sampling for Contextual Bandits with Linear Payoffs
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
This work develops a learning principle and an efficient algorithm for batch learning from logged bandit feedback and shows how CRM can be used to derive a new learning method - called Policy Optimizer for Exponential Models (POEM - for learning stochastic linear rules for structured output prediction.
Recursive partitioning for heterogeneous causal effects
This paper provides a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects, and proposes an “honest” approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation.
Batch learning from logged bandit feedback through counterfactual risk minimization
The empirical results show that the CRM objective implemented in POEM provides improved robustness and generalization performance compared to the state-of-the-art, and a decomposition of the POEM objective that enables efficient stochastic gradient optimization is presented.
Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology
This presentation will take "causality" not just as a casual concept implying some predictive association in a data set, and will illustrate why propensity score methods are gen- erally superior in practice to the standard predictive approaches for estimating causal effects.
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.
Machine Learning Methods for Estimating Heterogeneous Causal Eects
The method is closely related, but it diers in that it is tailored for predicting causal eects of a treatment rather than a unit’s outcome, which allows researchers to identify heterogeneity in treatment eector that was not specied in a pre-analysis plan, without concern about invalidating inference due to multiple testing.
Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study
This paper proposes to address the problem of estimating online metrics that depend on user feedback using causal inference techniques, under the contextual-bandit framework, and obtains very promising results that suggest the wide applicability of these techniques.