Matthew Jon Grounds

Learn More
One of the major difficulties in applying Q-learning to realworld domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function,(More)
In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for <i>single-agent</i> RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(&#955;) algorithm, with value functions represented using linear function approximators. In our(More)
  • 1