Corpus ID: 14818104

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

@article{Balduzzi2015CompatibleVG,
  title={Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies},
  author={D. Balduzzi and Muhammad Ghifary},
  journal={ArXiv},
  year={2015},
  volume={abs/1509.03005}
}
This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the value function, its gradient, and determine the actor's policy respectively. We evaluate GProp on two… Expand
26 Citations
Continuous control with deep reinforcement learning
  • 4,474
  • PDF
How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
  • 2
  • PDF
Accelerating Online Reinforcement Learning with Offline Datasets
  • 17
  • Highly Influenced
  • PDF
Parametric Circuit Optimization with Reinforcement Learning
  • C. Tang, Z. Ye, Y. Wang
  • Computer Science
  • 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
  • 2018
  • 1
MQLV: Modified Q-Learning for Vasicek Model
Reentry trajectory optimization based on Deep Reinforcement Learning
Learning Continuous Control Policies by Stochastic Value Gradients
  • 349
  • Highly Influenced
  • PDF
Memory-based control with recurrent neural networks
  • 141
  • PDF
Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
  • PDF
...
1
2
3
...

References

SHOWING 1-10 OF 51 REFERENCES
Policy Gradient Methods for Reinforcement Learning with Function Approximation
  • 3,490
  • Highly Influential
  • PDF
Value-gradient learning
  • M. Fairbank, E. Alonso
  • Computer Science
  • The 2012 International Joint Conference on Neural Networks (IJCNN)
  • 2012
  • 39
  • PDF
From Pixels to Torques: Policy Learning with Deep Dynamical Models
  • 111
  • PDF
Fast gradient-descent methods for temporal-difference learning with linear function approximation
  • 467
  • PDF
Online learning control by association and reinforcement
  • J. Si, Yu-Tsung Wang
  • Computer Science
  • Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium
  • 2000
  • 293
  • PDF
End-to-End Training of Deep Visuomotor Policies
  • 2,024
  • PDF
An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time
  • 31
  • PDF
Human-level control through deep reinforcement learning
  • 11,614
  • Highly Influential
  • PDF
Reinforcement Learning: An Introduction
  • 27,843
  • Highly Influential
  • PDF
...
1
2
3
4
5
...