Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

@article{Filippi2010OptimismIR,
  title={Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence},
  author={Sarah Filippi and Olivier Capp{\'e} and Aur{\'e}lien Garivier},
  journal={CoRR},
  year={2010},
  volume={abs/1004.5229}
}
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carrying out extended value iterations, under a constraint of consistency with the estimated model transition probabilities. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an efficient… CONTINUE READING

References

Publications referenced by this paper.

Similar Papers

Loading similar papers…