# A Neural Networks Committee for the Contextual Bandit Problem

@inproceedings{Allesiardo2014ANN, title={A Neural Networks Committee for the Contextual Bandit Problem}, author={Robin Allesiardo and Rapha{\"e}l F{\'e}raud and Djallel Bouneffouf}, booktitle={ICONIP}, year={2014} }

This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. [...] Key Method Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards. Expand

#### Figures and Topics from this paper

#### 56 Citations

Contextual Bandit with Missing Rewards

- Computer Science, Mathematics
- ArXiv
- 2020

Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards. Expand

Adaptive Representation Selection in Contextual Bandit.

- Computer Science, Mathematics
- 2018

An approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training on unlabeled history of contexts with online selection and modification of embedding functions is proposed. Expand

Hyper-parameter Tuning for the Contextual Bandit

- Computer Science, Mathematics
- ArXiv
- 2020

Two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm are presented, which the authors hope is the first step toward the automation of the multi-armed bandit algorithms. Expand

Online Semi-Supervised Learning with Bandit Feedback

- Computer Science, Mathematics
- ArXiv
- 2020

This work formulate a new problem at the intersection of semi-supervised learning and contextual bandits, motivated by several applications including clini-cal trials and ad recommendations, and takes the best of both approaches to develop multi-GCN embedded contextual bandit. Expand

Online learning with Corrupted context: Corrupted Contextual Bandits

- Computer Science, Mathematics
- ArXiv
- 2020

This work proposes to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism to address the corrupted-context setting where the context used at each decision may be corrupted ("useless context"). Expand

Adaptive Representation Selection in Contextual Bandit with Unlabeled History

- Computer Science
- ArXiv
- 2018

An approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training on unlabeled history of contexts with online selection and modification of embedding functions is proposed. Expand

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

- Computer Science, Mathematics
- ArXiv
- 2019

The NeuralUCB algorithm is proposed, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. Expand

Double-Linear Thompson Sampling for Context-Attentive Bandits

- Computer Science
- ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021

An online learning frame-work, motivated by various practical applications, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration, is analyzed and a novel algorithm, called Context-Attentive Thompson Sampling (CATS), is derived. Expand

Context Attentive Bandits: Contextual Bandit with Restricted Context

- Computer Science, Mathematics
- IJCAI
- 2017

This work adapts the standard multi-armed bandit algorithm known asThompson Sampling to take advantage of the restricted context setting, and proposes two novel algorithms, called the Thompson Sampling with Restricted Context (TSRC) and the Windows Thompson Samplings with Rest restricted Context (WTSRC), for handling stationary and nonstationary environments, respectively. Expand

Contextual Bandit with Adaptive Feature Extraction

- Computer Science
- 2018 IEEE International Conference on Data Mining Workshops (ICDMW)
- 2018

The approach starts with an off-line pre-training on unlabeled history of contexts, followed by an online selection and adaptation of encoders, which selects the most appropriate encoding function to extract a feature vector which becomes an input for a contextual bandit. Expand

#### References

SHOWING 1-10 OF 27 REFERENCES

Efficient bandit algorithms for online multiclass prediction

- Computer Science
- ICML '08
- 2008

The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). Expand

Efficient Optimal Learning for Contextual Bandits

- Computer Science, Mathematics
- UAI
- 2011

This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose. Expand

Thompson Sampling for Contextual Bandits with Linear Payoffs

- Computer Science, Mathematics
- ICML
- 2013

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed. Expand

Contextual Bandits with Linear Payoff Functions

- Mathematics, Computer Science
- AISTATS
- 2011

An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem. Expand

Mortal Multi-Armed Bandits

- Computer Science
- NIPS
- 2008

A new variant of the k-armed bandit problem, where arms have (stochastic) lifetime after which they expire, motivated by e-commerce applications and an optimal algorithm for the state-aware (deterministic reward function) case is presented. Expand

A contextual-bandit approach to personalized news article recommendation

- Computer Science
- WWW '10
- 2010

This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. Expand

The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

- Mathematics, Computer Science
- NIPS
- 2007

An algorithm for multi-armed bandits with observable side information with no knowledge of a time horizon and the regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. Expand

Playing Atari with Deep Reinforcement Learning

- Computer Science
- ArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand

On-line learning for very large data sets

- Computer Science
- 2005

This paper reconsiders the convergence speed in terms of how fast a learning algorithm optimizes the testing error and shows the superiority of the well designed stochastic learning algorithm. Expand

Regret bounds for sleeping experts and bandits

- Computer Science, Mathematics
- Machine Learning
- 2010

This work compares algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem and gives algorithms achieving information-theoretically optimal regret bounds with respect to the best-ordering benchmark. Expand