Linear Programming for Finite State Multi-Armed Bandit Problems

@article{Chen1986LinearPF,
  title={Linear Programming for Finite State Multi-Armed Bandit Problems},
  author={Yih Ren Chen and Michael N. Katehakis},
  journal={Math. Oper. Res.},
  year={1986},
  volume={11},
  pages={180-183}
}
We consider the multi-armed bandit problem. We show that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods. 

Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

  • Qing Zhao
  • Computer Science
    Multi-Armed Bandits
  • 2019
Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments and need to be addressed in the context of knowledge retrieval and reinforcement learning.

Survey of linear programming for standard and nonstandard Markovian control problems. Part II: Applications

  • L. Kallenberg
  • Computer Science, Mathematics
    Math. Methods Oper. Res.
  • 1994
This paper deals with some applications of Markov decision models for which the linear programming method is efficient, including replacement models, separable models and the multi-armed bandit model.

Monotonic Approximation of the Gittins Index

The Gittins index is useful in the study of bandit processes and Markov decision processes, and can be approximated by finite horizon break-even values determined in the truncated finite horizon

Stochastic Linear Bandits with Finitely Many Arms

The core idea is to introduce phases of determinisim into the algorithm so that within each phase the actions are chosen independently from the rewards.

The Multi-Armed Bandit Problem: Decomposition and Computation

It is shown that an approximate largest-index rule yields an approximately optimal policy for the N-project problem, and more efficient methods of computing the indices on-line and/or for sparse transition matrices in large state spaces than have been suggested heretofore.

The Multi-Armed Bandit Problem: Computational Aspects

The problem is to find a selection rule which maximizes the α-discounted rewards.

Finite State and Action MDPS

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite

Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems

The approach provides a polyhedral treatment of several classical problems in stochastic and dynamic scheduling and is able to address variations such as: discounted versus undiscounted cost criterion, rewards versus taxes, discrete versus continuous time, and linear versus nonlinear objective functions.

Multi‐Armed Bandits, Gittins Index, and its Calculation

Multi-armed bandit is a colorful term that refers to the di lemma faced by a gambler playing in a casino with multiple slot machines (which were colloquially called onearmed bandits). W h a t
...

References

SHOWING 1-10 OF 12 REFERENCES

Linear programming and finite Markovian control problems

This text is a revised version of the author's thesis for the University of Leiden and is mainly concerned with the theory of finite Markov decision problems. Such problems are those where a decision

The Multi-Armed Bandit Problem: Decomposition and Computation

It is shown that an approximate largest-index rule yields an approximately optimal policy for the N-project problem, and more efficient methods of computing the indices on-line and/or for sparse transition matrices in large state spaces than have been suggested heretofore.

Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints

This paper investigates the computation of transient-optimal policies in discrete dynamic programming and the concept of superharmonicity is introduced, which provides the linear program to compute the transientvalue-vector and a transient- optimal policy.

Some aspects of the sequential design of experiments

Until recently, statistical theory has been restricted to the design and analysis of sampling experiments in which the size and composition of the samples are completely determined before the

Optimization Over Time

Finite State Markovian Decision Processes

Bandit processes and dynamic allocation indices

The paper aims to give a unified account of the central concepts in recent work on bandit processes and dynamic allocation indices; to show how these reduce some previously intractable problems to

Extensions of the Multi-Armed Bandit Problem

  • 1984

Discussant of J. C. Gittins

  • Discussant of J. C. Gittins
  • 1979