Sequential Design of Experiments via Linear Programming


The celebrated multi-armed bandit problem in decision theory models the central trade-off between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In this paper we study the variant of the multi-armed bandit problem where the exploration phase involves costly experiments and occurs before the exploitation… (More)

