Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

@inproceedings{Dekel2012OnlineBL,
  title={Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret},
  author={Ofer Dekel and Ambuj Tewari and Raman Arora},
  booktitle={ICML},
  year={2012}
}
Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm’s ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm’s actions. We define the alternative notion of policy regret, which attempts to provide a more meaningful way to measure an online algorithm’s… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 64 CITATIONS, ESTIMATED 42% COVERAGE

FILTER CITATIONS BY YEAR

2012
2019

CITATION STATISTICS

  • 14 Highly Influenced Citations

  • Averaged 13 Citations per year over the last 3 years

Similar Papers

Loading similar papers…