Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation

  title={Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation},
  author={Luo Ji and Qin Qi and Bingqing Han and Hongxia Yang},
  journal={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  • Luo Ji, Qin Qi, Hongxia Yang
  • Published 20 August 2021
  • Computer Science
  • Proceedings of the 30th ACM International Conference on Information & Knowledge Management
Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In order to alleviate the cold-start issue, most existing methods introduce content and contextual information as the auxiliary information. Nevertheless, these methods assume the recommended items behave steadily over time, while in a typical E-commerce scenario, items generally have very different… 

Figures and Tables from this paper

A Survey on Reinforcement Learning for Recommender Systems

A thorough overview, comparisons, and summarization of RL approaches applied in four typical recommender scenarios, including interactive recommendation, conversational recommendatin, sequential recommendation, and explainable recommendation is provided.

Surrogate for Long-Term User Experience in Recommender Systems

A large-scale study of user behavior logs on one of the largest industrial recommendation platforms serving billions of users finds a subset of user behaviors that are predictive of users' increased visiting to the platform in $5$ months among the group of users with the same visiting frequency to begin with.



Deep Reinforcement Learning for List-wise Recommendations

This paper proposes a novel recommender system with the capability of continuously improving its strategies during the interactions with users and introduces an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online.

Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation

This paper addresses the cold-start challenge in the RL-based recommender systems by proposing a meta-level model-based reinforcement learning approach for fast user adaptation, and learns to infer each user's preference with a user context variable that enables recommendation systems to better adapt to new users with few interactions.

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

This paper model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback.

Addressing the Item Cold-Start Problem by Attribute-Driven Active Learning

This paper designs useful user selection criteria based on items’ attributes and users’ rating history, and combines the criteria in an optimization framework for selecting users, and generates accurate rating predictions for the other unselected users.

Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation

This paper proposes two techniques to alleviate the unstable reward estimation problem in dynamic environments, the stratified sampling replay strategy and the approximate regretted reward, which address the problem from the sample aspect and the reward aspect, respectively.

Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems

This work rigorously proves that with a high probability its proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations.

Deep reinforcement learning for page-wise recommendations

A principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page is proposed and a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users is proposed.

DRN: A Deep Reinforcement Learning Framework for News Recommendation

A Deep Q-Learning based recommendation framework, which can model future reward explicitly, is proposed, which considers user return pattern as a supplement to click / no click label in order to capture more user feedback information.

Top-K Off-Policy Correction for a REINFORCE Recommender System

This work presents a general recipe of addressing biases in a production top-K recommender system at Youtube, built with a policy-gradient-based algorithm, i.e. REINFORCE, and proposes a noveltop-K off-policy correction to account for the policy recommending multiple items at a time.