Bridging the Gap Between Value and Policy Based Reinforcement Learning

@inproceedings{Nachum2017BridgingTG,
  title={Bridging the Gap Between Value and Policy Based Reinforcement Learning},
  author={Ofir Nachum and Mohammad Norouzi and Kelvin Xu and Dale Schuurmans},
  booktitle={NIPS},
  year={2017}
}
We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that softmax consistent action values correspond to optimal entropy regularized policy probabilities along any action sequence, regardless of provenance. From this observation, we develop a new RL algorithm, Path Consistency Learning (PCL), that minimizes a notion of… CONTINUE READING
Highly Cited
This paper has 60 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 140 times. VIEW TWEETS

Citations

Publications citing this paper.

61 Citations

02040201720182019
Citations per Year
Semantic Scholar estimates that this publication has 61 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 49 references

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

2017 IEEE International Conference on Robotics and Automation (ICRA) • 2017
View 1 Excerpt

Introduction to Reinforcement Learning

R. S. Sutton, A. G. Barto
MIT Press, 2nd edition • 2017

and N

Z. Wang, V. Bapst, +3 authors K. Kavukcuoglu
de Freitas. Sample efficient actor-critic with experience replay. ICLR • 2017
View 1 Excerpt

Similar Papers

Loading similar papers…