Online Stochastic Optimization under Correlated Bandit Feedback

  title={Online Stochastic Optimization under Correlated Bandit Feedback},
  author={Mohammad Gheshlaghi Azar and Alessandro Lazaric and Emma Brunskill},
In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, and derive regret bounds matching the performance of stateof-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. The main advantage of HCT is that it handles the challenging case of correlated bandit feedback (reward), whereas… CONTINUE READING
Highly Cited
This paper has 17 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 5 times over the past 90 days. VIEW TWEETS


Publications citing this paper.
Showing 1-10 of 14 extracted citations


Publications referenced by this paper.
Showing 1-10 of 25 references

Nearoptimal regret bounds for reinforcement learning

  • Jaksch, Thomas, Ortner, Ronald, Auer, Peter
  • Journal of Machine Learning Research,
  • 2010
Highly Influential
14 Excerpts

Adaptive-tree bandits

  • Bull, Adam
  • arXiv preprint arXiv:1302.2489,
  • 2013

From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning

  • Munos, Rémi
  • Foundations and Trends in Machine Learning,
  • 2013

PAC-bayesempirical-bernstein inequality

  • Tolstikhin, O Ilya, Seldin, Yevgeny
  • In Advances in Neural Information Processing…
  • 2013

Similar Papers

Loading similar papers…