# Continuous-in-time Limit for Bayesian Bandits

@article{Zhu2022ContinuousintimeLF, title={Continuous-in-time Limit for Bayesian Bandits}, author={Yuhua Zhu and Zachary Izzo and Lexing Ying}, journal={ArXiv}, year={2022}, volume={abs/2210.07513} }

This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to ﬁnd the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we ﬁrst show that under a suitable rescaling, the Bayesian…

## One Citation

### A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit

- Mathematics, Computer ScienceArXiv
- 2022

This work explicitly compute the leading order term of the optimal regret and pseudoregret in three diﬀerent scaling regimes for the gap in a regime where the gap between these means goes to zero and the number of prediction periods approaches inﬁnity.

## References

SHOWING 1-10 OF 28 REFERENCES

### Bandit Algorithms

- Mathematics
- 2020

sets of environments and policies respectively and ` : E ×Π→ [0, 1] a bounded loss function. Given a policy π let `(π) = (`(ν1, π), . . . , `(νN , π)) be the loss vector resulting from policy π.…

### A New Approach to Drifting Games, Based on Asymptotically Optimal Potentials

- Computer Science, MathematicsArXiv
- 2022

A new approach to drifting games, a class of two-person games with many applications to boosting and online learning settings, including Prediction with Expert Advice and the Hedge game, is developed, which gives new potentials and derive corresponding upper and lower bounds that match each other in the asymptotic regime.

### A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit

- Mathematics, Computer ScienceArXiv
- 2022

This work explicitly compute the leading order term of the optimal regret and pseudoregret in three diﬀerent scaling regimes for the gap in a regime where the gap between these means goes to zero and the number of prediction periods approaches inﬁnity.

### Diffusion Approximations for Thompson Sampling

- Computer Science, MathematicsArXiv
- 2021

The weak convergence theory covers both the classical multi-armed and linear bandit settings, and can be used to obtain insight about the characteristics of the regret distribution when there is information sharing among arms, as well as the effects of variance estimation, model mis-specification and batched updates in bandit learning.

### Diffusion Asymptotics for Sequential Experiments

- Mathematics, Computer ScienceArXiv
- 2021

This work proposes a new diffusion-asymptotic analysis for sequentially randomized experiments that lets the mean signal level scale to the order 1/ √ n so as to preserve the difficulty of the learning task as n gets large.

### A Note on Optimization Formulations of Markov Decision Processes

- Mathematics, Computer Science
- 2020

This note summarizes the optimization formulations used in the study of Markov decision processes. We consider both the discounted and undiscounted processes under the standard and the…

### Diffusion Approximations for a Class of Sequential Experimentation Problems

- Computer ScienceManag. Sci.
- 2022

A diffusion approximation is derived for the sequential experimentation problem of a seller who wants to select an optimal assortment of products to launch into the marketplace and is uncertain about consumers’ preferences to demonstrate the effectiveness and robustness of the heuristics derived from the diffusion approximation.

### Sequential Procurement with Contractual and Experimental Learning

- EconomicsManag. Sci.
- 2022

The effect strategic sellers have on the buyer's optimal strategy relative to more traditional learning dynamics is identified, and it is established that, paradoxically, when sellers are strategic, the ability to observe delivered quality is not always beneficial for the buyer.

### Recommender Systems as Mechanisms for Social Learning

- Computer Science
- 2018

This article studies how a recommender system may incentivize users to learn about a product collaboratively and “seed” incentives for user exploration and determine the speed and trajectory of social learning.