The Best of Both Worlds: Stochastic and Adversarial Bandits


We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial… (More)


1 Figure or Table


Citations per Year

Citation Velocity: 32

Averaging 32 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics