• Corpus ID: 246904348

Should I send this notification? Optimizing push notifications decision making by modeling the future

@article{OBrien2022ShouldIS,
  title={Should I send this notification? Optimizing push notifications decision making by modeling the future},
  author={Conor O'Brien and Huasen Wu and Shaodan Zhai and Dalin Guo and Wenzhe Shi and Jonathan J. Hunt},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.08812}
}
Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may annoy a user and cause them to disable notifications. However, a myopic system will always choose… 

References

SHOWING 1-10 OF 30 REFERENCES
Offline Reinforcement Learning for Mobile Notifications
TLDR
This paper proposes an offline reinforcement learning framework to optimize sequential notification decisions for driving user engagement, and describes a state-marginalized importance sampling policy evaluation approach, which can be used to evaluate the policy offline and tune learning hyperparameters.
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising
TLDR
RecoGym is introduced, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics.
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
TLDR
This work develops SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates, and shows that the long-term value of a slate can be decomposed into a tractable function of its component item-wise LTVs.
Top-K Off-Policy Correction for a REINFORCE Recommender System
TLDR
This work presents a general recipe of addressing biases in a production top-K recommender system at Youtube, built with a policy-gradient-based algorithm, i.e. REINFORCE, and proposes a noveltop-K off-policy correction to account for the policy recommending multiple items at a time.
Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems
TLDR
Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.
A State Transition Model for Mobile Notifications via Survival Analysis
TLDR
This paper develops a survival model for badging notifications assuming a log-linear structure and a Weibull distribution and provides an online use case on notification delivery time optimization to show how to make better decisions, drive more user engagement, and provide more value to users.
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
TLDR
A new batch RL algorithm called Short Horizon Policy Improvement (SHPI) is developed that approximates policy-induced distribution shifts across sessions and recovers well-known policy improvement schemes in the RL literature.
Near Real-time Optimization of Activity-based Notifications
TLDR
This paper presents the strategy of optimizing notifications to balance various utilities (e.g., engagement, send volume) by formulating the problem using constrained optimization and implements the solution in a stream computing system in which it makes multi-channel send decisions in near real-time.
Learning to Rank For Push Notifications Using Pairwise Expected Regret
TLDR
An analysis of learning to rank for personalized mobile push notifications is contributed and a novel ranking loss based on weighting the pairwise loss between candidates by the expected regret incurred for misordering the pair is introduced.
RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems
TLDR
RecSim NG is described and illustrated how it can be used to create transparent, configurable, end-to-end models of a recommender ecosystem, complemented by a small set of simple use cases that demonstrate how RecSim NG can help both researchers and practitioners easily develop and train novel algorithms for recommender systems.
...
...