• Corpus ID: 220250845

Online learning with Corrupted context: Corrupted Contextual Bandits

@article{Bouneffouf2020OnlineLW,
  title={Online learning with Corrupted context: Corrupted Contextual Bandits},
  author={Djallel Bouneffouf},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.15194}
}
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting,we propose to combine the standard contextual bandit approach with a classical multi-armed… 

Figures and Tables from this paper

Contextual Bandit with Missing Rewards
TLDR
Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards.
Corrupted Contextual Bandits with Action Order Constraints
TLDR
A meta-algorithm using a referee that dynamically combines the policies of a contextual bandit and multi-armed bandit as well as a simple correlation mechanism that captures action to action transition probabilities allowing for more efficient exploration of time-correlated actions is proposed.
Online Semi-Supervised Learning with Bandit Feedback
TLDR
This work formulate a new problem at the intersection of semi-supervised learning and contextual bandits, motivated by several applications including clini-cal trials and ad recommendations, and takes the best of both approaches to develop multi-GCN embedded contextual bandit.
A New Bandit Setting Balancing Information from State Evolution and Corrupted Context
TLDR
An algorithm is presented that uses a referee to dynamically combine the policies of a contextual bandit and a multi-armed bandit, and captures the time-correlation of states through iteratively learning the action-reward transition model, allowing for efficient exploration of actions.
Adaptive and Reinforcement Learning Approaches for Online Network Monitoring and Analysis
TLDR
ADAM & RAL are applied to the real-time detection of network attacks in Internet network traffic, and it is shown that it is possible to continuously achieve high detection accuracy even under the occurrence of concept drifts, limiting the amount of labeled data needed for learning.
Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling
TLDR
A scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a heuristic on when to use it, which yields competitive low-rank approximations in test datasets compared to the other state-of-the-art methods.
Computing the Dirichlet-Multinomial Log-Likelihood Function
TLDR
This work uses mathematical properties of the gamma function to derive a closed form expression for the DMN log-likelihood function, which has a lower computational complexity and is much faster without comprimising computational accuracy.
Etat de l'art sur l'application des bandits multi-bras
TLDR
Un examen complet des principaux développements récents dans de multiples applications réelles des bandits, identifions les tendances actuelles importantes and fournissons de nouvelles perspectives concernant l’avenir de ce domaine en plein essor.

References

SHOWING 1-10 OF 43 REFERENCES
Context Attentive Bandits: Contextual Bandit with Restricted Context
TLDR
This work adapts the standard multi-armed bandit algorithm known asThompson Sampling to take advantage of the restricted context setting, and proposes two novel algorithms, called the Thompson Sampling with Restricted Context (TSRC) and the Windows Thompson Samplings with Rest restricted Context (WTSRC), for handling stationary and nonstationary environments, respectively.
Contextual Bandit with Adaptive Feature Extraction
TLDR
The approach starts with an off-line pre-training on unlabeled history of contexts, followed by an online selection and adaptation of encoders, which selects the most appropriate encoding function to extract a feature vector which becomes an input for a contextual bandit.
A Survey on Practical Applications of Multi-Armed and Contextual Bandits
TLDR
A taxonomy of common MAB-based applications is introduced and state-of-art for each of those domains is summarized, to identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.
A Neural Networks Committee for the Contextual Bandit Problem
TLDR
A new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards is presented, and two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons.
Contextual Bandit for Active Learning: Active Thompson Sampling
TLDR
A sequential algorithm named Active Thompson Sampling (ATS) is proposed, which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label.
Thompson Sampling for Contextual Bandits with Linear Payoffs
TLDR
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit
TLDR
The META algorithm is developed, which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations.
Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation
TLDR
A novel online system, based on an extension of the contextual bandits framework, that learns a set of behavioral constraints by observation and uses these constraints as a guide when making decisions in an online setting while still being reactive to reward feedback is detailed.
A contextual-bandit approach to personalized news article recommendation
TLDR
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
...
...