# Linear Programming for Finite State Multi-Armed Bandit Problems

@article{Chen1986LinearPF, title={Linear Programming for Finite State Multi-Armed Bandit Problems}, author={Yih Ren Chen and Michael N. Katehakis}, journal={Math. Oper. Res.}, year={1986}, volume={11}, pages={180-183} }

We consider the multi-armed bandit problem. We show that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.

## 81 Citations

### Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

- Computer ScienceMulti-Armed Bandits
- 2019

Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments and need to be addressed in the context of knowledge retrieval and reinforcement learning.

### Survey of linear programming for standard and nonstandard Markovian control problems. Part II: Applications

- Computer Science, MathematicsMath. Methods Oper. Res.
- 1994

This paper deals with some applications of Markov decision models for which the linear programming method is efficient, including replacement models, separable models and the multi-armed bandit model.

### Optimal stopping problems for multiarmed bandit processes with arms' independence

- Mathematics, Computer Science
- 1993

### Monotonic Approximation of the Gittins Index

- Mathematics
- 2002

The Gittins index is useful in the study of bandit processes and Markov decision processes, and can be approximated by finite horizon break-even values determined in the truncated finite horizon…

### Stochastic Linear Bandits with Finitely Many Arms

- Computer ScienceBandit Algorithms
- 2020

The core idea is to introduce phases of determinisim into the algorithm so that within each phase the actions are chosen independently from the rewards.

### The Multi-Armed Bandit Problem: Decomposition and Computation

- MathematicsMath. Oper. Res.
- 1987

It is shown that an approximate largest-index rule yields an approximately optimal policy for the N-project problem, and more efficient methods of computing the indices on-line and/or for sparse transition matrices in large state spaces than have been suggested heretofore.

### The Multi-Armed Bandit Problem: Computational Aspects

- Mathematics
- 1990

The problem is to find a selection rule which maximizes the α-discounted rewards.

### Finite State and Action MDPS

- Mathematics
- 2003

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite…

### Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems

- Mathematics, Computer ScienceIPCO
- 1993

The approach provides a polyhedral treatment of several classical problems in stochastic and dynamic scheduling and is able to address variations such as: discounted versus undiscounted cost criterion, rewards versus taxes, discrete versus continuous time, and linear versus nonlinear objective functions.

### Multi‐Armed Bandits, Gittins Index, and its Calculation

- Economics
- 2014

Multi-armed bandit is a colorful term that refers to the di lemma faced by a gambler playing in a casino with multiple slot machines (which were colloquially called onearmed bandits). W h a t…

## References

SHOWING 1-10 OF 12 REFERENCES

### Linear programming and finite Markovian control problems

- Mathematics
- 1983

This text is a revised version of the author's thesis for the University of Leiden and is mainly concerned with the theory of finite Markov decision problems. Such problems are those where a decision…

### The Multi-Armed Bandit Problem: Decomposition and Computation

- MathematicsMath. Oper. Res.
- 1987

It is shown that an approximate largest-index rule yields an approximately optimal policy for the N-project problem, and more efficient methods of computing the indices on-line and/or for sparse transition matrices in large state spaces than have been suggested heretofore.

### Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints

- MathematicsMath. Program.
- 1984

This paper investigates the computation of transient-optimal policies in discrete dynamic programming and the concept of superharmonicity is introduced, which provides the linear program to compute the transientvalue-vector and a transient- optimal policy.

### Some aspects of the sequential design of experiments

- Mathematics
- 1952

Until recently, statistical theory has been restricted to the design and analysis of sampling experiments in which the size and composition of the samples are completely determined before the…

### Bandit processes and dynamic allocation indices

- Mathematics
- 1979

The paper aims to give a unified account of the central concepts in recent work on bandit processes and dynamic allocation indices; to show how these reduce some previously intractable problems to…

### Extensions of the Multi-Armed Bandit Problem

- 1984

### Discussant of J. C. Gittins

- Discussant of J. C. Gittins
- 1979