Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems


In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward. There is a player who sequentially selects one of the arms at each time step. The goal of the player is to maximize its… (More)


3 Figures and Tables

Slides referencing similar topics