Threshold Bandits, With and Without Censored Feedback


We consider the Threshold Bandit setting, a variant of the classical multi-armed bandit problem in which the reward on each round depends on a piece of side information known as a threshold value. The learner selects one of K actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the… (More)


  • Presentations referencing similar topics