# Deep Contextual Multi-armed Bandits

@article{Collier2018DeepCM, title={Deep Contextual Multi-armed Bandits}, author={Mark Collier and Hector Llorens}, journal={ArXiv}, year={2018}, volume={abs/1807.09809} }

Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompsonâ€¦Â Expand

#### 14 Citations

An Empirical Study of Neural Kernel Bandits

- Computer Science, Mathematics
- ArXiv
- 2021

It is proposed to directly apply NKinduced distributions to guide an upper confidence bound or Thompson samplingbased policy and it is shown that NK bandits achieve state-of-the-art performance on highly non-linear structured data. Expand

Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning

- Computer Science, Economics
- ArXiv
- 2020

The hedging problem is viewed as an instance of a risk-averse contextual bandit problem, for which a large body of theoretical results and well-studied algorithms are available and it is found that the k-armed bandit model naturally fits to the P\&L formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning. Expand

Deep Reinforcement Learning with Weighted Q-Learning

- Computer Science, Mathematics
- ArXiv
- 2020

This work provides the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. Expand

Modeling uncertainty to improve personalized recommendations via Bayesian deep learning

- Computer Science
- 2021

An approach based on Bayesian deep learning to improve personalized recommendations by capturing the uncertainty associated with the model output and utilizing it to boost exploration in the context of Recommender Systems. Expand

Bayesian Deep Learning Based Exploration-Exploitation for Personalized Recommendations

- Computer Science
- 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)
- 2019

This paper presents an approach based on Bayesian Deep Learning to learn a compact representation of user and item attributes to guide exploitation and shows how to further boost exploration by incorporating model uncertainty with that of data uncertainty. Expand

Bao: Learning to Steer Query Optimizers

- Computer Science
- ArXiv
- 2020

Bao combines modern tree convolutional neural networks with Thompson sampling, a decades-old and well-studied reinforcement learning algorithm, to take advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Expand

Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem

- Biology
- Natural Science
- 2019

This work employs machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device. Expand

Bao: Making Learned Query Optimization Practical

- Computer Science
- SIGMOD Conference
- 2021

Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints, and combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. Expand

Bao: Making LearnedQuery Optimization Practical

- 2021

Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tailâ€¦ Expand

Deep Contextual Bandits for Fast Initial Access in mmWave Based User-Centric Ultra-Dense Networks

- Computer Science
- 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring)
- 2021

A novel deep contextual bandit (DCB) based approach to perform fast and efficient IA in mmWave based UC UD networks using one reference signal from the user to predict the IA beam, which improves beam discovery delay and relaxes the requirement for radio resources. Expand

#### References

SHOWING 1-10 OF 29 REFERENCES

Contextual Multi-Armed Bandits

- Mathematics, Computer Science
- AISTATS
- 2010

A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces. Expand

Thompson Sampling for Contextual Bandits with Linear Payoffs

- Computer Science, Mathematics
- ICML
- 2013

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed. Expand

Finite-time Analysis of the Multiarmed Bandit Problem

- Computer Science
- Machine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

- Computer Science, Mathematics
- Found. Trends Mach. Learn.
- 2012

The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Expand

Using Confidence Bounds for Exploitation-Exploration Trade-offs

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2002

It is shown how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off, and improves the regret from O(T3/4) to T1/2. Expand

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

- Mathematics, Computer Science
- ICML
- 2016

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. Expand

The Nonstochastic Multiarmed Bandit Problem

- Mathematics, Computer Science
- SIAM J. Comput.
- 2002

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. Expand

Concrete Dropout

- Computer Science, Mathematics
- NIPS
- 2017

This work proposes a new dropout variant which gives improved performance and better calibrated uncertainties, and uses a continuous relaxation of dropoutâ€™s discrete masks to allow for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. Expand

A contextual-bandit approach to personalized news article recommendation

- Computer Science
- WWW '10
- 2010

This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. Expand

Ensemble learning in Bayesian neural networks

- Mathematics
- 1998

Bayesian treatments of learning in neural networks are typically based either on a local Gaussian approximation to a mode of the posterior weight distribution, or on Markov chain Monte Carloâ€¦ Expand