• Corpus ID: 246904348

Should I send this notification? Optimizing push notifications decision making by modeling the future

  title={Should I send this notification? Optimizing push notifications decision making by modeling the future},
  author={Conor O'Brien and Huasen Wu and Shaodan Zhai and Dalin Guo and Wenzhe Shi and Jonathan J. Hunt},
Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may annoy a user and cause them to disable notifications. However, a myopic system will always choose… 


Offline Reinforcement Learning for Mobile Notifications
This paper proposes an offline reinforcement learning framework to optimize sequential notification decisions for driving user engagement, and describes a state-marginalized importance sampling policy evaluation approach, which can be used to evaluate the policy offline and tune learning hyperparameters.
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising
RecoGym is introduced, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics.
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
This work develops SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates, and shows that the long-term value of a slate can be decomposed into a tractable function of its component item-wise LTVs.
Top-K Off-Policy Correction for a REINFORCE Recommender System
This work presents a general recipe of addressing biases in a production top-K recommender system at Youtube, built with a policy-gradient-based algorithm, i.e. REINFORCE, and proposes a noveltop-K off-policy correction to account for the policy recommending multiple items at a time.
Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems
Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.
A State Transition Model for Mobile Notifications via Survival Analysis
This paper develops a survival model for badging notifications assuming a log-linear structure and a Weibull distribution and provides an online use case on notification delivery time optimization to show how to make better decisions, drive more user engagement, and provide more value to users.
Exploration in Recommender Systems
The roles of exploration in recommender systems are examined in three facets: 1) system exploration to reduce system uncertainty in regions with sparse feedback; 2) user exploration to introduce users to new interests/tastes; and 3) online exploration to take into account real-time user feedback.
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
A new batch RL algorithm called Short Horizon Policy Improvement (SHPI) is developed that approximates policy-induced distribution shifts across sessions and recovers well-known policy improvement schemes in the RL literature.
Near Real-time Optimization of Activity-based Notifications
This paper presents the strategy of optimizing notifications to balance various utilities (e.g., engagement, send volume) by formulating the problem using constrained optimization and implements the solution in a stream computing system in which it makes multi-channel send decisions in near real-time.
Learning to Rank For Push Notifications Using Pairwise Expected Regret
An analysis of learning to rank for personalized mobile push notifications is contributed and a novel ranking loss based on weighting the pairwise loss between candidates by the expected regret incurred for misordering the pair is introduced.