Fatigue-Aware Bandits for Dependent Click Models

  title={Fatigue-Aware Bandits for Dependent Click Models},
  author={Junyu Cao and Wei Sun and Zuo‐Jun Max Shen and Markus Ettl},
As recommender systems send a massive amount of content to keep users engaged, users may experience fatigue which is contributed by 1) an overexposure to irrelevant content, 2) boredom from seeing too many similar recommendations. To address this problem, we consider an online learning setting where a platform learns a policy to recommend content that takes user fatigue into account. We propose an extension of the Dependent Click Model (DCM) to describe users' behavior. We stipulate that for… 

Figures from this paper

Modeling Attrition in Recommender Systems with Departing Bandits
This work proposes a novel multi-armed bandit setup that captures such policy-dependent horizons and provides an efficient learning algorithm that achieves O ( √ T ) regret, where T is the number of users.


User Fatigue in Online News Recommendation
By analyzing user behavioral logs from Bing Now news recommendation, it is found that user fatigue is a severe problem that greatly affects the user experience and experimental results indicate that significant gains can be achieved by introducing features that reflect users' interaction with previously seen recommendations.
Thompson Sampling for a Fatigue-aware Online Recommendation System
A new Thompson sampling based algorithm with expected regret is proposed that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice.
Just in Time Recommendations: Modeling the Dynamics of Boredom in Activity Streams
This paper analyzes user activity streams and shows that user's temporal consumption of familiar items is driven by boredom, and models this behavior using a Hidden Semi-Markov Model for the gaps between user consumption activities.
Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue
This work proposes a novel sequential choice model to capture multiple interactions taking place between the platform and its user, and proposes an algorithm that balances exploration and exploitation, and characterize its regret bound.
Fighting Boredom in Recommender Systems with Linear Reinforcement Learning
This paper casts the problem as a Markov decision process, where the rewards are a linear function of the recent history of actions, and shows that a policy considering the long-term influence of the recommendations may outperform both fixed-action and contextual greedy policies.
DCM Bandits: Learning to Rank with Multiple Clicks
This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model, and proposes DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfactory items, such as web pages.
Efficient multiple-click models in web search
This paper presents two multiple-click models: the independent click model which is reformulated from previous work, and the dependent click model (DCM) which takes into consideration dependencies between multiple clicks.
Multiple-Play Bandits in the Position-Based Model
This work proposes to exploit available information regarding the display position bias under the so-called Position-based click model (PBM), and provides a novel regret lower bound for this model as well as computationally efficient algorithms that display good empirical and theoretical performance.
Improving recommendation lists through topic diversification
This work presents topic diversification, a novel method designed to balance and diversify personalized recommendation lists in order to reflect the user's complete spectrum of interests, and introduces the intra-list similarity metric to assess the topical diversity of recommendation lists.
Predicting clicks: estimating the click-through rate for new ads
This work shows that it can be used to use features of ads, terms, and advertisers to learn a model that accurately predicts the click-though rate for new ads, and shows that using this model improves the convergence and performance of an advertising system.