Abstract

We study a Bayesian multi-armed bandit (MAB) setting in which a principal seeks to maximize the sum of expected time-discounted rewards obtained by pulling arms, when the arms are actually pulled by selfish and myopic individuals. Since such individuals pull the arm with highest expected posterior reward (i.e., they always exploit and never explore), the… (More)
DOI: 10.1145/2600057.2602897
View Slides

1 Figure or Table

Topics

Statistics

0102030201520162017
Citations per Year

Citation Velocity: 10

Averaging 10 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.
  • Presentations referencing similar topics