• Corpus ID: 239024808

Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations

  title={Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations},
  author={Yu Song and Jianxun Lian and Shuai Sun and Hong Huang and Yu Li and Hai Jin and Xing Xie},
User interest exploration is an important and challenging topic in recommender systems, which alleviates the closed-loop effects between recommendation models and user-item interactions. Contextual bandit (CB) algorithms strive to make a good trade-off between exploration and exploitation so that users’ potential interests have chances to expose. However, classical CB algorithms can only be applied to a small, sampled item set (usually hundreds), which forces the typical applications in… 

Figures and Tables from this paper


Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms
An efficient particle-learning based online algorithm is developed for inferring both latent parameters and states of the model by taking advantage of the fully adaptive inference strategy of particle learning techniques, and the inferred model can be naturally integrated with existing multi-armed selection strategies in an interactive collaborative filtering setting.
Learning Tree-based Deep Model for Recommender Systems
A novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks is proposed and can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction.
Collaborative Filtering Bandits
This work investigates an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings, showing scalability and increased prediction performance over state-of-the-art methods for clustering bandits on medium-size real-world datasets.
A contextual-bandit approach to personalized news article recommendation
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
Bandits and Recommender Systems
The combination of matrix factorization and bandit algorithms to solve the on-line recommendation problem is driven by considering the recommendation problem as a feedback controlled loop and leads to interactions between the representation learning, and the recommendation policy.
Using Exploration to Alleviate Closed Loop Effects in Recommender Systems
This paper introduces the closed loop feedback and investigates the effect of closedloop feedback in both the training and offline evaluation of recommendation models, in contrast to a further exploration of the users' preferences (obtained from the randomly presented items).
Fast distributed bandits for online recommendation systems
This paper proposes a novel distributed bandit-based algorithm called DistCLUB, which lazily creates clusters in a distributed manner, and dramatically reduces the network data sharing requirement, achieving high scalability.
Learning from Cross-Modal Behavior Dynamics with Graph-Regularized Neural Contextual Bandit
An extensive set of experiments are conducted on two benchmark datasets as well as a large scale proprietary dataset from a major search engine demonstrates the power of the proposed GRC model in effectively capturing users’ dynamic preferences under different settings by outperforming all baselines by a large margin.
Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation
A hierarchical adaptive contextual bandit method (HATCH) is proposed to conduct the policy learning of contextual bandits with a budget constraint and it is proved that HATCH achieves a regret bound as low as .
Dynamic Clustering of Contextual Multi-Armed Bandits
This work proposes an algorithm to divide the population of users into multiple clusters, and to customize the bandits to each cluster, and this clustering is dynamic, i.e., users can switch from one cluster to another, as their preferences change.