To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning

  title={To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning},
  author={Sridhar Mahadevan},
Most work in reinforcement learning (RL) is based on discounted techniques, such as Q learning, where long-term rewards are geometrically attenuated based on the delay in their occurence. Schwartz recently proposed an undiscounted RL technique called R learning that optimizes average reward, and argued that it was a better metric than the discounted one optimized by Q learning. In this paper we compare R learning with Q learning on a simulated robot box-pushing task. We compare these two… CONTINUE READING
Highly Cited
This paper has 42 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 28 extracted citations

The QV family compared to other reinforcement learning algorithms

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning • 2009
View 5 Excerpts
Highly Influenced

Market Model Benchmark Suite for Machine Learning Techniques

IEEE Computational Intelligence Magazine • 2018
View 1 Excerpt

Optimal routing control of a construction machine by deep reinforcement learning

2018 IEEE 15th International Workshop on Advanced Motion Control (AMC) • 2018
View 1 Excerpt

Multi-robot collaboration based on Markov decision process in Robocup3D soccer simulation game

The 27th Chinese Control and Decision Conference (2015 CCDC) • 2015


Publications referenced by this paper.
Showing 1-10 of 17 references

Reinforcement Learning for Robots using Neural Networks

L. Lin
PhD thesis, Carnegie-Mellon Univ, • 1993
View 6 Excerpts
Highly Influenced

Dynamic Programming: Determin- istic and Stochastic Models

D. Bertsekas
View 3 Excerpts
Highly Influenced

Applied Statistics: A

L. Sachs
Handbook of Tech- niques. Springer-Verlag, • 1982
View 3 Excerpts
Highly Influenced

Learning to Solve Markovian Decision Processes

S. Singh
PhD thesis, Univ of Massachusetts, • 1994
View 2 Excerpts

learning: A reinforce- ment learning method to optimize undiscounted average reward

P. Tadepalli, H D.Ok.
Technical Report 94-30-01, • 1994

Robot Learning

J. Connell, S. Mahadevan
Kluwer Academic Publishers, • 1993
View 1 Excerpt

Similar Papers

Loading similar papers…