Learn More
In this paper, we propose an efficient algorithm for estimating the natural policy gradient using parameter-based exploration; this algorithm samples directly in the parameter space. Unlike previous methods based on natural gradients, our algorithm calculates the natural policy gradient using the inverse of the exact Fisher information matrix. The(More)
Reinforcement learning is a useful tool for complex control problems that cannot be modeled mathematically nor solved theoretically. Direct policy search(DPS) is an approach for reinforcement learning that represents a policy using some model and searches an optimal parameter directly by optimization techniques such as genetic algorithms(GA). Instance-based(More)
  • 1