Gittins’ theorem under uncertainty

  title={Gittins’ theorem under uncertainty},
  author={Samuel N. Cohen and Tanut Treetanthiploet},
  journal={Electronic Journal of Probability},
We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the… 

Figures from this paper

Asymptotic Randomised Control with applications to bandits.
This work considers a general multi-armed bandit problem with correlated elements, as a relaxed control problem, and obtains a smooth asymptotic approximation to the value function by introducing an entropy premium.
Gambling under unknown probabilities as a proxy for real world decisions under uncertainty
We give elementary examples within a framework for studying decisions under uncertainty where probabilities are only roughly known. The framework, in gambling terms, is that the size of a bet is


Optimal stopping under ambiguity in continuous time
We develop a theory of optimal stopping problems under ambiguity in continuous time. Using results from (backward) stochastic calculus, we characterize the value function as the smallest (nonlinear)
American Options, Multi–armed Bandits, and Optimal Consumption Plans: A Unifying View
In this survey, we show that various stochastic optimization problems arising in option theory, in dynamical allocation problems, and in the microeconomic theory of intertemporal consumption choice
Optimal learning and experimentation in bandit problems
Optimal Stopping With Multiple Priors
We develop a theory of optimal stopping under Knightian uncertainty. A suitable martingale theory for multiple priors is derived that extends the classical dynamic programming or Snell envelope
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
It turns out that the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm.
Robust control of the multi-armed bandit problem
We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm
General Gittins index processes in discrete time.
  • N. El Karoui, I. Karatzas
  • Mathematics
    Proceedings of the National Academy of Sciences of the United States of America
  • 1993
This work combines the formulation of Mandelbaum and Whittle to obtain a simple and constructive proof for the optimality of Gittins index processes in the general, nonmarkovian dynamic allocation (or "multi-armed bandit") problem.
Reflected Backward Stochastic Difference Equations and Optimal Stopping Problems under g-expectation
In this paper, we study reflected backward stochastic difference equations (RBSDEs for short) with finitely many states in discrete time. The general existence and uniqueness result, as well as
Robust Multiarmed Bandit Problems
A robust bandit problem is formulated in which a decision maker accounts for distrust in the nominal model by solving a worst-case problem against an adversary who has the ability to alter the underlying reward distribution and does so to minimize the decision maker’s expected total profit.
Four proofs of Gittins’ multiarmed bandit theorem
We study four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins’ original exchange argument, Weber’s prevailing charge argument, Whittle’s