Mark D. Pendrith

Learn More
It is well known that for Markov decision processes , the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of(More)
This paper argues that for many domains, we can expect credit-assignment methods that use actual returns to be more eeective for reinforcement learning than the more commonly used temporal diierence methods. We present analysis and empirical evidence from three sets of experiments in diierent domains to support this claim. A new algorithm we call C-Trace, a(More)
This paper introduces the RL-TOPs architecture for robot learning, a hybrid system combining teleo-reactive planning and reinforcement learning techniques. The aim of this system is to speed up learning by decomposing complex tasks into hierarchies of simple behaviours which can be learnt more easily. Behaviours learnt in this way can subsequently be(More)
  • 1