Corpus ID: 50787267

Deep Contextual Multi-armed Bandits

  title={Deep Contextual Multi-armed Bandits},
  author={Mark Collier and Hector Llorens},
Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompson… Expand
An Empirical Study of Neural Kernel Bandits
It is proposed to directly apply NKinduced distributions to guide an upper confidence bound or Thompson samplingbased policy and it is shown that NK bandits achieve state-of-the-art performance on highly non-linear structured data. Expand
Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning
The hedging problem is viewed as an instance of a risk-averse contextual bandit problem, for which a large body of theoretical results and well-studied algorithms are available and it is found that the k-armed bandit model naturally fits to the P\&L formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning. Expand
Deep Reinforcement Learning with Weighted Q-Learning
This work provides the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. Expand
Modeling uncertainty to improve personalized recommendations via Bayesian deep learning
An approach based on Bayesian deep learning to improve personalized recommendations by capturing the uncertainty associated with the model output and utilizing it to boost exploration in the context of Recommender Systems. Expand
Bayesian Deep Learning Based Exploration-Exploitation for Personalized Recommendations
  • X. Wang, Serdar Kadioglu
  • Computer Science
  • 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)
  • 2019
This paper presents an approach based on Bayesian Deep Learning to learn a compact representation of user and item attributes to guide exploitation and shows how to further boost exploration by incorporating model uncertainty with that of data uncertainty. Expand
Bao: Learning to Steer Query Optimizers
Bao combines modern tree convolutional neural networks with Thompson sampling, a decades-old and well-studied reinforcement learning algorithm, to take advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Expand
Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem
This work employs machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device. Expand
Bao: Making Learned Query Optimization Practical
Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints, and combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. Expand
Bao: Making LearnedQuery Optimization Practical
Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tailExpand
Deep Contextual Bandits for Fast Initial Access in mmWave Based User-Centric Ultra-Dense Networks
A novel deep contextual bandit (DCB) based approach to perform fast and efficient IA in mmWave based UC UD networks using one reference signal from the user to predict the IA beam, which improves beam discovery delay and relaxes the requirement for radio resources. Expand


Contextual Multi-Armed Bandits
A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces. Expand
Thompson Sampling for Contextual Bandits with Linear Payoffs
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Expand
Using Confidence Bounds for Exploitation-Exploration Trade-offs
  • P. Auer
  • Mathematics, Computer Science
  • J. Mach. Learn. Res.
  • 2002
It is shown how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off, and improves the regret from O(T3/4) to T1/2. Expand
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. Expand
The Nonstochastic Multiarmed Bandit Problem
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. Expand
Concrete Dropout
This work proposes a new dropout variant which gives improved performance and better calibrated uncertainties, and uses a continuous relaxation of dropout’s discrete masks to allow for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. Expand
A contextual-bandit approach to personalized news article recommendation
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. Expand
Ensemble learning in Bayesian neural networks
Bayesian treatments of learning in neural networks are typically based either on a local Gaussian approximation to a mode of the posterior weight distribution, or on Markov chain Monte CarloExpand