Rémy Degenne

Learn More
In this paper we study Temporal Difference (TD) Learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and(More)
  • 1