@inproceedings{Szepesvri1997TheAC, title={The Asymptotic Convergence-Rate of Q-learning}, author={Csaba Szepesv{\'a}ri}, booktitle={NIPS}, year={1997} }

- Published 1997 in NIPS

In this paper we show that for discounted MDPs with discount factor > 1=2 the asymptotic rate of convergence of Q-learning is O(1=t R(1)) if R(1) < 1=2 and O(p log log t=t) otherwise provided that the state-action pairs are sampled from a xed probability distribution. Here R = p min =p max is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to convergent on-line learning provided that p min > 0, where p min and p max now become the minimum and maximum… CONTINUE READING

