A contextual-bandit approach to personalized news article recommendation

  title={A contextual-bandit approach to personalized news article recommendation},
  author={Lihong Li and Wei Chu and John Langford and Robert E. Schapire},
Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. [] Key Method Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to…

Figures and Tables from this paper

Personalized Recommendation via Parameter-Free Contextual Bandits

This work proposes a parameter-free bandit strategy, which employs a principled resampling approach called online bootstrap, to derive the distribution of estimated models in an online manner and demonstrates the effectiveness of the proposed algorithm in terms of the click-through rate.

Contextual Bandit Approach-based Recommendation System for Personalized Web-based Services

The experiment results show that CoLin outperforms Hybrid-LinUBC and LinUCB, reporting cumulated regret of 8.950 for LastFm and 60.34 for MovieLens20M and 34.10 for Yahoo Front Page Today Module.

Ensemble contextual bandits for personalized recommendation

A meta-bandit paradigm is employed that places a hyper bandit over the base bandits, to explicitly explore/exploit the relative importance of base bandits based on user feedbacks to obtain robust predicted click-through rate (CTR) of web objects.

A Contextual Bandit Approach to Personalized Online Recommendation via Sparse Interactions

This paper proposes a novel approach, named SAOR, to make online recommendations via sparse interactions that uses positive and negative responses to build the user preference model, ignoring all non-responses.

Personalized Recommendation via Parameter-Free

This paper formulate personalized recommendation as a contextual bandit problem to solve the exploration/exploitation dilemma and proposes a parameter-free bandit strategy, which employs a principled resampling approach called online bootstrap, to derive the distribution of estimated models in an online manner.

Data-driven evaluation of Contextual Bandit algorithms and applications to Dynamic Recommendation

It is shown that a bootstrap-based approach allows to significantly reduce this bias and more importantly to control it and is commented on on the result of an experiment of unprecedented scale: a public challenge.

Collaborative Filtering Bandits

This work investigates an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings, showing scalability and increased prediction performance over state-of-the-art methods for clustering bandits on medium-size real-world datasets.

Stochastic Models to Improve E-News Recommender Systems

First results demonstrate that models who use only information from the recent past are the best, and whether these models are best, varying data contexts, and how to generate more personalized models are looked at.

Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation

A novel contextual combinatorial bandit method called UBM-LinUCB is proposed to address two issues related to positions by adopting the User Browsing Model (UBM), a click model for web search.



Personalized recommendation on dynamic content using predictive bilinear models

This work proposes a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively and results in an offline model with light computational overhead compared with other recommender systems that require online re-training.

Naïve filterbots for robust cold-start recommendations

This work improves the scalability and performance of a previous approach to handling cold-start situations that uses filterbots, or surrogate users that rate items based only on user or item attributes, and shows that introducing a very small number of simple filterbots helps make CF algorithms more robust.

Google news personalization: scalable online collaborative filtering

This paper describes the approach to collaborative filtering for generating personalized recommendations for users of Google News using MinHash clustering, Probabilistic Latent Semantic Indexing, and covisitation counts, and combines recommendations from different algorithms using a linear model.

Just-in-time contextual advertising

Empirical evaluation proves that matching ads on the basis of a carefully selected 5% fraction of the page text sacrifices only 1%-3% in ad relevance, and is competitive with matching based on the entire page content.

Online Models for Content Optimization

A new content publishing system that selects articles to serve to a user, choosing from an editorially programmed pool that is frequently refreshed, is described and deployed on a major Yahoo! portal, and significantly increases the number of user clicks over the original manual approach.

Explore/Exploit Schemes for Web Content Optimization

A Bayesian solution to find the optimal trade-off between explore and exploit for web content publishing applications where dynamic set of items with short lifetimes, delayed feedback and non-stationary reward distributions are typical is developed.

The Adaptive Web, Methods and Strategies of Web Personalization

This paper presents a meta-modelling architecture for the adaptive web that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and cataloging content on the web.

Efficient bandit algorithms for online multiclass prediction

The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label).

A case study of behavior-driven conjoint analysis on Yahoo!: front page today module

A successful large-scale case study of conjoint analysis on click through stream in a real-world application at Yahoo!, considers identifying users' heterogenous preferences from millions of click/view events and building predictive models to classify new users into segments of distinct behavior pattern.

Sample mean based index policies by O(log n) regret for the multi-armed bandit problem

  • R. Agrawal
  • Computer Science, Mathematics
    Advances in Applied Probability
  • 1995
This paper constructs index policies that depend on the rewards from each arm only through their sample mean, and achieves a O(log n) regret with a constant that is based on the Kullback–Leibler number.