# The Multi-Armed Bandit Problem: Decomposition and Computation

@article{Katehakis1987TheMB, title={The Multi-Armed Bandit Problem: Decomposition and Computation}, author={Michael N. Katehakis and Arthur F. Veinott}, journal={Math. Oper. Res.}, year={1987}, volume={12}, pages={262-268} }

This paper is dedicated to our friend and mentor, Cyrus Derman, on the occasion of his 60th birthday.
The multi-armed bandit problem arises in sequentially allocating effort to one of N projects and sequentially assigning patients to one of N treatments in clinical trials. Gittins and Jones Gittins, J. C., Jones, D. M. 1974. A dynamic allocation index for the sequential design of experiments. J. Gani, K. Sarkadi, L. Vince, eds. Progress in Statistics. European Meeting of Statisticians, 1972… Expand

#### Topics from this paper

#### 268 Citations

Multiarmed Bandits and Gittins Index

- Computer Science
- 2011

The multiarmed bandit problem is a sequential decision problem about allocating effort amongst a number of alternative projects, only one of which may receive effort at a time, and its solution in terms of the Gittins index is described. Expand

The multi-armed bandit, with constraints

- Computer Science, Mathematics
- PERV
- 2012

Pair-wise comparison, rather than optimal stopping, is used to demonstrate the optimality of a priority rule, and the transition probabilities and one-step rewards of the transformed bandits are used to compute the performance characteristics of index policies in polynomial times. Expand

Essays on sequential analysis: Multi-armed bandit with availability constraints and sequential change detection and identification

- Computer Science
- 2009

This dissertation addresses two sequential decision problems: the multi-armed bandit and the sequential change detection and identi cation problems and proposes simple sequential decision strategies and shows their asymptotic optimalities under two Bayesian formulations. Expand

MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT

- Mathematics
- Probability in the Engineering and Informational Sciences
- 2014

Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection of… Expand

Q-Learning for Bandit Problems

- Mathematics, Computer Science
- ICML
- 1995

This paper suggests utilizing task-state-specific Q-learning agents to solve their respective restart-in-state-$i$ subproblems, and includes an example in which the online reinforcement learning approach is applied to a simple problem of stochastic scheduling. Expand

Risk-Sensitive and Risk-Neutral Multiarmed Bandits

- Mathematics, Computer Science
- Math. Oper. Res.
- 2007

It is optimal to play at each epoch any bandit whose current state is not dominated by the current states of the other bandits, and this result is obtained by a coherent analysis that encompasses three models---one with risk-averse exponential utility, one withrisk-seeking exponential utility and one with linear utility and either stopping or discounting. Expand

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.

- Computer Science, Mathematics
- Statistical science : a review journal of the Institute of Mathematical Statistics
- 2015

A novel bandit-based patient allocation rule is proposed that overcomes the issue of low power, thus removing a potential barrier for their use in practice and indicating that bandit approaches offer significant advantages and severe limitations in terms of their resulting statistical power. Expand

A Perpetual Search for Talent Across Overlapping Generations: A Learning Process

- Mathematics
- 2014

We consider a class of multi-armed bandit problems which is at the same time an arm-acquiring, restless and mortal bandit, and where the rewards follow any distribution. This is the case for a… Expand

Reflections on a New Approach to Gittins Indexation

- Computer Science
- 1996

A simple dynamic programming proof of the optimality of Gittins index policies and a range of index-based suboptimality bounds for general policies for a variety of stochastic models for resource allocation are obtained. Expand

The Multi-Armed Bandit Problem: Computational Aspects

- Computer Science
- 1990

The problem is to find a selection rule which maximizes the α-discounted rewards. Expand

#### References

SHOWING 1-10 OF 13 REFERENCES

Extensions of the multiarmed bandit problem: The discounted case

- Computer Science
- 1985

A reformulation of the bandit problem yields the tax problem, which includes Klimov's waiting time problem, and an index rule is derived for the case where new machines arrive randomly. Expand

Linear Programming for Finite State Multi-Armed Bandit Problems

- Mathematics, Computer Science
- Math. Oper. Res.
- 1986

It is shown that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods. Expand

A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index

- Mathematics, Computer Science
- Math. Oper. Res.
- 1986

It is shown that instead of the Kv linear programs for project v also one parametric linear program with the same dimensions can be solved. Expand

haaOit Processes and Dynamic Allocati < » i Indices

- J . Ray . Statist . Soc . Ser . B
- 1979

A Note (» M N. Katdtakis, and Y.-R. Chen's OMnputation of the Gittins Index

- Math. Oper. Res
- 1986

Introduction to Stodiastic Dynamic Progrtanming. Academic ftess

- Introduction to Stodiastic Dynamic Progrtanming. Academic ftess
- 1983

A Dynamic Allocation Index for the Sequential Design of Experiments

- 1974