The Multi-Armed Bandit Problem: Decomposition and Computation

  title={The Multi-Armed Bandit Problem: Decomposition and Computation},
  author={Michael N. Katehakis and Arthur F. Veinott},
  journal={Math. Oper. Res.},
This paper is dedicated to our friend and mentor, Cyrus Derman, on the occasion of his 60th birthday. The multi-armed bandit problem arises in sequentially allocating effort to one of N projects and sequentially assigning patients to one of N treatments in clinical trials. Gittins and Jones Gittins, J. C., Jones, D. M. 1974. A dynamic allocation index for the sequential design of experiments. J. Gani, K. Sarkadi, L. Vince, eds. Progress in Statistics. European Meeting of Statisticians, 1972… Expand
Multiarmed Bandits and Gittins Index
The multiarmed bandit problem is a sequential decision problem about allocating effort amongst a number of alternative projects, only one of which may receive effort at a time, and its solution in terms of the Gittins index is described. Expand
The multi-armed bandit, with constraints
Pair-wise comparison, rather than optimal stopping, is used to demonstrate the optimality of a priority rule, and the transition probabilities and one-step rewards of the transformed bandits are used to compute the performance characteristics of index policies in polynomial times. Expand
Essays on sequential analysis: Multi-armed bandit with availability constraints and sequential change detection and identification
This dissertation addresses two sequential decision problems: the multi-armed bandit and the sequential change detection and identi cation problems and proposes simple sequential decision strategies and shows their asymptotic optimalities under two Bayesian formulations. Expand
Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection ofExpand
Q-Learning for Bandit Problems
  • M. Duff
  • Mathematics, Computer Science
  • ICML
  • 1995
This paper suggests utilizing task-state-specific Q-learning agents to solve their respective restart-in-state-$i$ subproblems, and includes an example in which the online reinforcement learning approach is applied to a simple problem of stochastic scheduling. Expand
Risk-Sensitive and Risk-Neutral Multiarmed Bandits
It is optimal to play at each epoch any bandit whose current state is not dominated by the current states of the other bandits, and this result is obtained by a coherent analysis that encompasses three models---one with risk-averse exponential utility, one withrisk-seeking exponential utility and one with linear utility and either stopping or discounting. Expand
Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.
  • S. Villar, J. Bowden, J. Wason
  • Computer Science, Mathematics
  • Statistical science : a review journal of the Institute of Mathematical Statistics
  • 2015
A novel bandit-based patient allocation rule is proposed that overcomes the issue of low power, thus removing a potential barrier for their use in practice and indicating that bandit approaches offer significant advantages and severe limitations in terms of their resulting statistical power. Expand
A Perpetual Search for Talent Across Overlapping Generations: A Learning Process
We consider a class of multi-armed bandit problems which is at the same time an arm-acquiring, restless and mortal bandit, and where the rewards follow any distribution. This is the case for aExpand
Reflections on a New Approach to Gittins Indexation
A simple dynamic programming proof of the optimality of Gittins index policies and a range of index-based suboptimality bounds for general policies for a variety of stochastic models for resource allocation are obtained. Expand
The Multi-Armed Bandit Problem: Computational Aspects
The problem is to find a selection rule which maximizes the α-discounted rewards. Expand


Extensions of the multiarmed bandit problem: The discounted case
A reformulation of the bandit problem yields the tax problem, which includes Klimov's waiting time problem, and an index rule is derived for the case where new machines arrive randomly. Expand
Linear Programming for Finite State Multi-Armed Bandit Problems
It is shown that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods. Expand
A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index
It is shown that instead of the Kv linear programs for project v also one parametric linear program with the same dimensions can be solved. Expand
Bandit processes and dynamic allocation indices
haaOit Processes and Dynamic Allocati < » i Indices
  • J . Ray . Statist . Soc . Ser . B
  • 1979
A Note (» M N. Katdtakis, and Y.-R. Chen's OMnputation of the Gittins Index
  • Math. Oper. Res
  • 1986
Introduction to Stodiastic Dynamic Progrtanming. Academic ftess
  • Introduction to Stodiastic Dynamic Progrtanming. Academic ftess
  • 1983
Optimization Over Time
Multi‐Armed Bandits and the Gittins Index
A Dynamic Allocation Index for the Sequential Design of Experiments
  • 1974