• Corpus ID: 128358546

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

@article{Bouneffouf2019ASO,
  title={A Survey on Practical Applications of Multi-Armed and Contextual Bandits},
  author={Djallel Bouneffouf and Irina Rish},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.10040}
}
In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical… 

Tables from this paper

Toward Optimal Solution for the Context-Attentive Bandit Problem
TLDR
A novel algorithm is derived, called Context-Attentive Thompson Sampling (CATS), which builds upon the LinearThompson Sampling approach, adapting it to Context-attentive Bandit setting and providing an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.
AutoBandit: A Meta Bandit Online Learning System
TLDR
An intelligent system equipped with many out-of-the-box MAB algorithms, for automatically and adaptively choosing the best with suitable hyper parameters online, effective to help a growing application for continuously maximizing cumulative rewards of its whole life-cycle.
COM-MABs: From Users' Feedback to Recommendation
TLDR
A novel Reward Computing process, BUSBC, is proposed, which significantly increases the global accuracy reached by optimistic COM-MAB algorithms -- up to 16.2\% -- and several feedback strategies from the literature on three real-world application datasets are conducted, confirming the propositions.
Contextual Bandit with Missing Rewards
TLDR
Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards.
Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit
TLDR
The META algorithm is developed, which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations.
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
TLDR
A novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms is proposed and evaluated using three distinct strategies.
Lifelong Learning in Multi-Armed Bandits
TLDR
This paper proposes a bandit over bandit approach with greedy algorithms and performs extensive experimental evaluations in both stationary and non-stationary environments and applies the solution to the mortal bandit problem, showing empirical improvement over previous work.
Solving Multi-Arm Bandit Using a Few Bits of Communication
TLDR
This paper proposes QuBan, a generic reward quantization algorithm that applies to any (no-regret) multi-armed bandit algorithm, and upper bounds apply under mild assumptions on the reward distributions over all current (and future) MAB algorithms, including those used in contextual bandits.
Online learning with Corrupted context: Corrupted Contextual Bandits
TLDR
This work proposes to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism to address the corrupted-context setting where the context used at each decision may be corrupted ("useless context").
Double-Linear Thompson Sampling for Context-Attentive Bandits
TLDR
An online learning frame-work, motivated by various practical applications, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration, is analyzed and a novel algorithm, called Context-Attentive Thompson Sampling (CATS), is derived.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
Context Attentive Bandits: Contextual Bandit with Restricted Context
TLDR
This work adapts the standard multi-armed bandit algorithm known asThompson Sampling to take advantage of the restricted context setting, and proposes two novel algorithms, called the Thompson Sampling with Restricted Context (TSRC) and the Windows Thompson Samplings with Rest restricted Context (WTSRC), for handling stationary and nonstationary environments, respectively.
Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens
TLDR
A multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification, which is a sequential algorithm, which in each round assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for the label of this sampled point.
A contextual-bandit approach to personalized news article recommendation
TLDR
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
Large-Scale Bandit Approaches for Recommender Systems
TLDR
This paper proposes two large-scale bandit approaches under the situations that there is no available priori information, and theoretically proves that these approaches can converge to optimal item recommendations in the long run.
Thompson Sampling for Contextual Bandits with Linear Payoffs
TLDR
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
Risk-aware multi-armed bandit problem with application to portfolio selection
TLDR
This paper incorporates risk awareness into the classic multi-armed bandit setting and introduces an algorithm to construct portfolio and achieves a balance between risk and return.
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
TLDR
A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Portfolio Choices with Orthogonal Bandit Learning
TLDR
This paper presents a bandit algorithm for conducting online portfolio choices by effectually exploiting correlations among multiple arms and derives the optimal portfolio strategy that represents the combination of passive and active investments according to a risk-adjusted reward function.
A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System
TLDR
This paper introduces an algorithm based on dynamic exploration/exploitation and can adaptively balance the two aspects by deciding which user's situation is most relevant for exploration or exploitation.
Collaborative Clustering through Constrained Networks using Bandit Optimization
TLDR
This paper proposes a collaborative peer to peer clustering algorithm based on the principle of non stochastic multi-arm bandits to assess in real time which algorithms or views can bring useful information.
...
1
2
3
4
5
...