• Corpus ID: 170079159

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

@article{Ie2019ReinforcementLF,
  title={Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology},
  author={Eugene Ie and Vihan Jain and Jing Wang and Sanmit Narvekar and Ritesh Agarwal and Rui Wu and Heng-Tze Cheng and Morgane Lustman and Vince Gatto and Paul Covington and Jim McFadden and Tushar Chandra and Craig Boutilier},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.12767}
}
Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items - which may have interacting effects on user choice - methods are required to deal with the combinatorics of the RL action space. In this work, we… 

Figures from this paper

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
TLDR
SLATEQ is developed, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates and shows that the long-term value of a slate can be decomposed into a tractable function of its component item-wise LTVs.
Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
TLDR
This work builds upon the recent advances in batch (offline) RL and distributional RL to learn from offline logs while dealing with the intrinsically stochastic nature of rewards from the users due to varied latent interest preferences (environments).
Reinforcement Learning based Recommender Systems: A Survey
TLDR
A survey on reinforcement learning based recommender systems (RLRSs) is presented and it is recognized and illustrated that RLRSs can be generally classified into RL- and DRL-based methods and proposed an RLRS framework with four components, i.e., state representation, policy optimization, reward formulation, and environment building.
Reinforcement Learning for Strategic Recommendations
TLDR
Various use-cases and research challenges to make strategic recommendations practical are covered, including points of interest recommendations, tutorial recommendations, next step guidance in multi-media editing software, and ad recommendation for optimizing lifetime value.
Hierarchical Reinforcement Learning for Integrated Recommendation
TLDR
A novel Hierarchical reinforcement learning framework for integrated recommendation (HRL-Rec), which divides the integrated recommendation into two tasks to recommend channels and items sequentially, and design various rewards for both recommendation accuracy and diversity.
Estimating and Penalizing Preference Shift in Recommender Systems
TLDR
This work advocate for estimating the preference shifts that would be induced by recommender system policies, and explicitly characterizing what unwanted shifts are and assessing before deployment whether such policies will produce them, and shows that recommender systems that optimize for staying in the trust region avoid manipulative behaviors, while still generating engagement.
Variation Control and Evaluation for Generative Slate Recommendations
TLDR
This paper proposes to enhance the accuracy-based evaluation with slate variation metrics to estimate the stochastic behavior of generative models and shows that item perturbation can enforce slate variation and mitigate the over-concentration of generated slates, which expand the “elbow” performance to an easy-to-find region.
State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study
TLDR
Experimental results indicate that existing findings do not generalize to the debiased SOFA simulator generated from a different dataset and a Deep Q-Network (DQN)-based method when compared with more state encoders.
A Load Balanced Recommendation Approach
TLDR
This paper proposes Load Balanced Recommender System (LBRS), which uses a probabilistic scheme for item recommendation and introduces a new metric for diversity, which emphasizes the importance of diversity not only from an intra-list perspective, but also from a between-list point of view.
Do Offline Metrics Predict Online Performance in Recommender Systems?
TLDR
This work investigates the extent to which offline metrics predict online performance by evaluating eleven recommenders across six controlled simulated environments and study the impact of adding exploration strategies, and observes that their effectiveness is highly dependent on the recommendation algorithm.
...
...

References

SHOWING 1-10 OF 74 REFERENCES
SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
TLDR
SLATEQ is developed, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates and shows that the long-term value of a slate can be decomposed into a tractable function of its component item-wise LTVs.
Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees
TLDR
Results show that a RL algorithm equipped with these off-policy evaluation techniques outperforms the myopic approaches and give fundamental insights on the difference between the click through rate (CTR) and life-time value (LTV) metrics for evaluating the performance of a PAR algorithm.
Automatic Representation for Lifetime Value Recommender Systems
TLDR
This paper proposes a new architecture for combining RL with recommendation systems which obviates the need for hand-tuned features, thus automating the state-space representation construction process.
Top-K Off-Policy Correction for a REINFORCE Recommender System
TLDR
This work presents a general recipe of addressing biases in a production top-K recommender system at Youtube, built with a policy-gradient-based algorithm, i.e. REINFORCE, and proposes a noveltop-K off-policy correction to account for the policy recommending multiple items at a time.
Reinforcement Learning based Recommender System using Biclustering Technique
TLDR
This paper forms a novel RL-based recommender system as a gridworld game by using a biclustering technique that can reduce the state and action space significantly and improves the recommendation quality effectively handling the cold-start problem.
Deep reinforcement learning for page-wise recommendations
TLDR
A principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page is proposed and a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users is proposed.
Collaborative Deep Learning for Recommender Systems
TLDR
A hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix is proposed, which can significantly advance the state of the art.
Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions
TLDR
The new agent's superiority over agents that either ignore the combinatorial or sequential long-term value aspect is demonstrated on a range of environments with dynamics from a real-world recommendation system.
Usage-based web recommendations: a reinforcement learning approach
TLDR
This paper proposes that the reinforcement learning paradigm provides an appropriate model for the recommendation problem, as well as a framework in which the system constantly interacts with the user and learns from her behavior, and how this approach can improve the quality of web recommendations.
Jointly Leveraging Intent and Interaction Signals to Predict User Satisfaction with Slate Recommendations
TLDR
It is hypothesized that user interactions are conditional on the specific intent users have when interacting with a recommendation system, and the need for explicitly considering user intent when interpreting interaction signals is highlighted.
...
...