Linear Programming for Finite State Multi-Armed Bandit Problems

  title={Linear Programming for Finite State Multi-Armed Bandit Problems},
  author={Yih Ren Chen and Michael N. Katehakis},
  journal={Math. Oper. Res.},
We consider the multi-armed bandit problem. We show that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods. 

Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

  • Qing Zhao
  • Computer Science
    Multi-Armed Bandits
  • 2019
Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments and need to be addressed in the context of knowledge retrieval and reinforcement learning.

Monotonic Approximation of the Gittins Index

The Gittins index is useful in the study of bandit processes and Markov decision processes, and can be approximated by finite horizon break-even values determined in the truncated finite horizon

Stochastic Linear Bandits with Finitely Many Arms

The core idea is to introduce phases of determinisim into the algorithm so that within each phase the actions are chosen independently from the rewards.

The Multi-Armed Bandit Problem: Computational Aspects

The problem is to find a selection rule which maximizes the α-discounted rewards.

Finite State and Action MDPS

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite

Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems

The approach provides a polyhedral treatment of several classical problems in stochastic and dynamic scheduling and is able to address variations such as: discounted versus undiscounted cost criterion, rewards versus taxes, discrete versus continuous time, and linear versus nonlinear objective functions.

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.

  • S. VillarJ. BowdenJ. Wason
  • Computer Science
    Statistical science : a review journal of the Institute of Mathematical Statistics
  • 2015
A novel bandit-based patient allocation rule is proposed that overcomes the issue of low power, thus removing a potential barrier for their use in practice and indicating that bandit approaches offer significant advantages and severe limitations in terms of their resulting statistical power.

Multi‐Armed Bandits, Gittins Index, and its Calculation

Multi-armed bandit is a colorful term that refers to the di lemma faced by a gambler playing in a casino with multiple slot machines (which were colloquially called onearmed bandits). W h a t

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

A new fast-pivoting algorithm is obtained that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations.



Bandit processes and dynamic allocation indices

The paper aims to give a unified account of the central concepts in recent work on bandit processes and dynamic allocation indices; to show how these reduce some previously intractable problems to

Linear programming and finite Markovian control problems

This text is a revised version of the author's thesis for the University of Leiden and is mainly concerned with the theory of finite Markov decision problems. Such problems are those where a decision

Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints

This paper investigates the computation of transient-optimal policies in discrete dynamic programming and the concept of superharmonicity is introduced, which provides the linear program to compute the transientvalue-vector and a transient- optimal policy.

Some aspects of the sequential design of experiments

Until recently, statistical theory has been restricted to the design and analysis of sampling experiments in which the size and composition of the samples are completely determined before the

Optimization Over Time

Finite State Markovian Decision Processes

The Multi-Armed Bandit Problem: Decomposition and Computation

It is shown that an approximate largest-index rule yields an approximately optimal policy for the N-project problem, and more efficient methods of computing the indices on-line and/or for sparse transition matrices in large state spaces than have been suggested heretofore.

Extensions of the Multi-Armed Bandit Problem

  • 1984

Discussant of J. C. Gittins

  • Discussant of J. C. Gittins
  • 1979