The multi-armed bandit, with constraints

The colorfully-named and much-studied multi-armed bandit is the following Markov decision problem: At epochs 1, 2, ... , a decision maker observes the current state of each of several Markov chains with rewards (bandits) and plays one of them. The Markov chains that are not played remain in their current states. The Markov chain that is played evolves for… CONTINUE READING