Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

@article{Zou2019ReinforcementLT,
  title={Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems},
  author={Lixin Zou and Long Xia and Zhuoye Ding and Jiaxing Song and Weidong Liu and Dawei Yin},
  journal={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  year={2019}
}
  • Lixin Zou, Long Xia, +3 authors Dawei Yin
  • Published 2019
  • Computer Science
  • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Recommender systems play a crucial role in our daily lives. [...] Key Method FeedRec includes two components: 1)~a Q-Network which designed in hierarchical LSTM takes charge of modeling complex user behaviors, and 2)~a S-Network, which simulates the environment, assists the Q-Network and voids the instability of convergence in policy learning. Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of…Expand
Self-Supervised Reinforcement Learning for Recommender Systems
TLDR
This paper proposes two frameworks namely Self-supervised Q-learning and Self-Supervised Actor-Critic and integrates the proposed frameworks with four state-of-the-art recommendation models, demonstrating the effectiveness of the approach on real-world datasets. Expand
Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation
Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In orderExpand
Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation
TLDR
The proposed PDQ not only avoids the instability of convergence and high computation cost of existing approaches but also provides unlimited interactions without involving real customers, and a proved upper bound of empirical error of reward function guarantees that the learned offline policy has lower bias and variance. Expand
State representation modeling for deep reinforcement learning based recommendation
TLDR
Inspired by recent advances in feature interaction modeling in user response prediction, it is discovered that explicitly modeling user–item interactions in state representation can largely help the recommendation policy perform effective reinforcement learning. Expand
Neural Interactive Collaborative Filtering
TLDR
The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Expand
Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning
TLDR
This work investigates the potential of leveraging knowledge graph (KG) in dealing with issues of RL methods for IRS, which provides rich side information for recommendation decision making and makes use of the prior knowledge of the item correlation learned from KG to guide the candidate selection for better candidate item retrieval. Expand
Top-aware reinforcement learning based recommendation
TLDR
A Supervised deep Reinforcement learning Recommendation framework named as SRR is proposed, which utilizes a supervised learning model to partially guide the learning of recommendation policy, where the supervision signal and RL signal are jointly employed and updated in a complementary fashion. Expand
Reinforcement learning based recommender systems: A survey
TLDR
A survey on reinforcement learning based recommender systems (RLRSs) is presented and algorithms developed for RLRSs can be generally classified into RLand DRL-based methods, e.g., Q-learning, SARSA, and REINFORCE. Expand
Reinforcement Recommendation with User Multi-aspect Preference
TLDR
This paper considers how to model user multi-aspect preferences in the context of RL-based recommender system with reinforcement learning frameworks, and base the model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. Expand
KERL: A Knowledge-Guided Reinforcement Learning Model for Sequential Recommendation
TLDR
This work formalizes the sequential recommendation task as a Markov Decision Process (MDP), and makes three major technical extensions in this framework, including state representation, reward function and learning algorithm, which is the first time that knowledge information has been explicitly discussed and utilized in RL-based sequential recommenders, especially for the exploration process. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 41 REFERENCES
Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems
TLDR
This work rigorously proves that with a high probability its proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations. Expand
Deep Reinforcement Learning for List-wise Recommendations
TLDR
This paper proposes a novel recommender system with the capability of continuously improving its strategies during the interactions with users and introduces an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Expand
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning
TLDR
This paper model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Expand
Beyond clicks: dwell time for personalization
TLDR
A novel method to compute accurate dwell time based on client-side and server-side logging is described and how to normalize dwell time across different devices and contexts is demonstrated. Expand
Deep reinforcement learning for page-wise recommendations
TLDR
A principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page is proposed and a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users is proposed. Expand
Deep Reinforcement Learning for Search, Recommendation, and Online Advertising: A Survey
TLDR
An overview of deep reinforcement learning for search, recommendation, and online advertising from methodologies to applications, review representative algorithms, and discuss some appealing research directions are given. Expand
Session-based Recommendations with Recurrent Neural Networks
TLDR
It is argued that by modeling the whole session, more accurate recommendations can be provided by an RNN-based approach for session-based recommendations, and introduced several modifications to classic RNNs such as a ranking loss function that make it more viable for this specific problem. Expand
Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit
TLDR
A dynamical context drift model based on particle learning is proposed that is able to effectively capture the context change and learn the latent parameters of a contextual multi-armed bandit problem where the reward mapping function changes over time. Expand
Partially Observable Markov Decision Process for Recommender Systems
TLDR
The POMDP-Rec framework is proposed, which is a neural-optimized Partially Observable Markov Decision Process algorithm for recommender systems and automatically achieves comparable results with those models fine-tuned exhaustively by domain exports on public datasets. Expand
Improving recommender systems with adaptive conversational strategies
TLDR
It is shown that the optimal strategy is different from the fixed one, and supports more effective and efficient interaction sessions, and allows conversational systems to autonomously improve a fixed strategy and eventually learn a better one using reinforcement learning techniques. Expand
...
1
2
3
4
5
...