Tighter Bounds for Multi-Armed Bandits with Expert Advice

@inproceedings{McMahan2009TighterBF,
  title={Tighter Bounds for Multi-Armed Bandits with Expert Advice},
  author={H. Brendan McMahan and Matthew J. Streeter},
  booktitle={COLT},
  year={2009}
}
Bandit problems are a classic way of formulating exploration versus exploitation tradeoffs. Auer et al. [ACBFS02] introduced the EXP4 algorithm, which explicitly decouples the set of A actions which can be taken in the world from the set of M experts (general strategies for selecting actions) with which we wish to be competitive. Auer et al. show that EXP4 has expected cumulative regret bounded by O( √ TA logM), where T is the total number of rounds. This bound is attractive when the number of… CONTINUE READING
Highly Cited
This paper has 26 citations. REVIEW CITATIONS
19 Citations
15 References
Similar Papers

Similar Papers

Loading similar papers…