REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs

@inproceedings{Bartlett2009REGALAR,
  title={REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs},
  author={Peter L. Bartlett and Ambuj Tewari},
  booktitle={UAI},
  year={2009}
}
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of Õ(HS √ AT ). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating… CONTINUE READING

Topics

Statistics

0102020082009201020112012201320142015201620172018
Citations per Year

102 Citations

Semantic Scholar estimates that this publication has 102 citations based on the available data.

See our FAQ for additional information.