The Asymptotic Convergence-Rate of Q-learning

@inproceedings{Szepesvri1997TheAC,
  title={The Asymptotic Convergence-Rate of Q-learning},
  author={Csaba Szepesv{\'a}ri},
  booktitle={NIPS},
  year={1997}
}
In this paper we show that for discounted MDPs with discount factor > 1=2 the asymptotic rate of convergence of Q-learning is O(1=t R(1)) if R(1) < 1=2 and O(p log log t=t) otherwise provided that the state-action pairs are sampled from a xed probability distribution. Here R = p min =p max is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to convergent on-line learning provided that p min > 0, where p min and p max now become the minimum and maximum… CONTINUE READING
Highly Cited
This paper has 54 citations. REVIEW CITATIONS

Topics

Statistics

0510'00'02'04'06'08'10'12'14'16'18
Citations per Year

55 Citations

Semantic Scholar estimates that this publication has 55 citations based on the available data.

See our FAQ for additional information.