Highly Influenced

- Published 2012 in Annals OR
DOI:10.1145/2185395.2185430

The colorfully-named and much-studied multi-armed bandit is the following Markov decision problem: At epochs 1, 2, ... , a decision maker observes the current state of each of several Markov chains with rewards (bandits) and plays one of them. The Markov chains that are not played remain in their current states. The Markov chain that is played evolves for… CONTINUE READING

### Presentations referencing similar topics