Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

  title={Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence},
  author={Sarah Filippi and Olivier Capp{\'e} and Aur{\'e}lien Garivier},
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carrying out extended value iterations, under a constraint of consistency with the estimated model transition probabilities. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an efficient… CONTINUE READING


Publications referenced by this paper.

Similar Papers

Loading similar papers…