# Gittins’ theorem under uncertainty

@article{Cohen2022GittinsTU, title={Gittins’ theorem under uncertainty}, author={Samuel N. Cohen and Tanut Treetanthiploet}, journal={Electronic Journal of Probability}, year={2022} }

We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the…

## 2 Citations

Asymptotic Randomised Control with applications to bandits.

- Computer Science, Mathematics
- 2020

This work considers a general multi-armed bandit problem with correlated elements, as a relaxed control problem, and obtains a smooth asymptotic approximation to the value function by introducing an entropy premium.

Gambling under unknown probabilities as a proxy for real world decisions under uncertainty

- Economics
- 2021

We give elementary examples within a framework for studying decisions under uncertainty where probabilities are only roughly known. The framework, in gambling terms, is that the size of a bet is…

## References

SHOWING 1-10 OF 101 REFERENCES

Optimal stopping under ambiguity in continuous time

- Mathematics
- 2013

We develop a theory of optimal stopping problems under ambiguity in continuous time. Using results from (backward) stochastic calculus, we characterize the value function as the smallest (nonlinear)…

American Options, Multi–armed Bandits, and Optimal Consumption Plans: A Unifying View

- Mathematics
- 2003

In this survey, we show that various stochastic optimization problems arising in option theory, in dynamical allocation problems, and in the microeconomic theory of intertemporal consumption choice…

Optimal Stopping With Multiple Priors

- Economics
- 2009

We develop a theory of optimal stopping under Knightian uncertainty. A suitable martingale theory for multiple priors is derived that extends the classical dynamic programming or Snell envelope…

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

- Computer ScienceCOLT
- 2016

It turns out that the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm.

Robust control of the multi-armed bandit problem

- Economics, Mathematics
- 2014

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm…

General Gittins index processes in discrete time.

- MathematicsProceedings of the National Academy of Sciences of the United States of America
- 1993

This work combines the formulation of Mandelbaum and Whittle to obtain a simple and constructive proof for the optimality of Gittins index processes in the general, nonmarkovian dynamic allocation (or "multi-armed bandit") problem.

Reflected Backward Stochastic Difference Equations and Optimal Stopping Problems under g-expectation

- Mathematics
- 2013

In this paper, we study reflected backward stochastic difference equations (RBSDEs for short) with finitely many states in discrete time. The general existence and uniqueness result, as well as…

Robust Multiarmed Bandit Problems

- EconomicsManag. Sci.
- 2016

A robust bandit problem is formulated in which a decision maker accounts for distrust in the nominal model by solving a worst-case problem against an adversary who has the ability to alter the underlying reward distribution and does so to minimize the decision maker’s expected total profit.

Four proofs of Gittins’ multiarmed bandit theorem

- MathematicsAnn. Oper. Res.
- 2016

We study four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins’ original exchange argument, Weber’s prevailing charge argument, Whittle’s…