Langevin Monte Carlo for Contextual Bandits
@inproceedings{Xu2022LangevinMC, title={Langevin Monte Carlo for Contextual Bandits}, author={Pan Xu and Hongkai Zheng and Eric V. Mazumdar and Kamyar Azizzadenesheli and Anima Anandkumar}, booktitle={International Conference on Machine Learning}, year={2022} }
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior distribution for general reward generating functions. We propose an efficient posterior sampling…
3 Citations
Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random steps
- Computer Science, MathematicsArXiv
- 2022
It is shown that HMC can sample from a distribution that is ε -close in total variation distance using (cid:101) O ( √ κd 1 / 4 log(1 /ε )) gradient queries, where κ is the condition number of Σ.
Ungeneralizable Contextual Logistic Bandit in Credit Scoring
- Computer ScienceArXiv
- 2022
It is the case that greedy algorithms consistently outperform algorithms with e-cient exploration, such as Thompson sampling given enough timesteps which increase with the complexity of underlying features.
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
- Computer ScienceJournal of Systems Science and Systems Engineering
- 2022
This paper applies the Thompson Sampling algorithm for the disjoint model, and provides a comprehensive regret analysis for a variant of the proposed algorithm that holds with probability 1 − δ under the mean-variance criterion with risk tolerance ρ.
References
SHOWING 1-10 OF 50 REFERENCES
Thompson Sampling for Contextual Bandits with Linear Payoffs
- Computer ScienceICML
- 2013
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
On Approximate Thompson Sampling with Langevin Algorithms
- Computer ScienceICML
- 2020
This work proposes two efficient Langevin MCMC algorithms tailored to Thompson sampling and derives novel posterior concentration bounds and MCMC convergence rates for logconcave distributions which may be of independent interest.
Neural Thompson Sampling
- Computer ScienceICLR
- 2021
This paper proposes a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation, with a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
- Computer ScienceICLR
- 2018
This work benchmarks well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems and finds that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario.
Neural Contextual Bandits with UCB-based Exploration
- Computer ScienceICML
- 2020
A new algorithm, NeuralUCB, is proposed, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration.
Thompson Sampling via Local Uncertainty
- Computer ScienceICML
- 2020
A new probabilistic modeling framework for Thompson sampling is proposed, where local latent variable uncertainty is used to sample the mean reward, and semi-implicit structure is further introduced to enhance its expressiveness.
Learning to Optimize via Posterior Sampling
- Computer ScienceMath. Oper. Res.
- 2014
A Bayesian regret bound for posterior sampling is made that applies broadly and can be specialized to many model classes and depends on a new notion the authors refer to as the eluder dimension, which measures the degree of dependence among action rewards.
Bayesian Learning via Stochastic Gradient Langevin Dynamics
- Computer ScienceICML
- 2011
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic…
User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient
- Computer ScienceStochastic Processes and their Applications
- 2019
Thompson Sampling and Approximate Inference
- Computer ScienceNeurIPS
- 2019
It is shown that even small constant inference error can lead to poor performance (linear regret) due to under-exploration (for $\alpha 0$) by the approximation, but for $\alpha > 0$ this is unavoidable, and the regret can be improved by adding a small amount of forced exploration.