Vassilis A. Papavassiliou

  • Citations Per Year
Learn More
A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis(More)
  • 1