#### Filter Results:

- Full text PDF available (10)

#### Publication Year

1994

2002

- This year (0)
- Last 5 years (0)
- Last 10 years (0)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Mark Pendrith
- 1994

If reinforcement learning (RL) techniques are to be used for \real world" dynamic system control, the problems of noise and plant disturbance will have to be addressed. This study investigates the e ects of noise/disturbance on ve di erent RL algorithms: Watkins' Q-Learning (QL); Barto, Sutton and Anderson's Adaptive Heuristic Critic (AHC); Sammut and Law's… (More)

- Mark D. Pendrith, Michael McGarity
- ICML
- 1998

It is well known that for Markov decision processes, the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of… (More)

- Theodore J. Perkins, Mark D. Pendrith
- ICML
- 2002

- Malcolm R. K. Ryan, Mark D. Pendrith
- ICML
- 1998

This paper introduces the RL-TOPs architecture for robot learning, a hybrid system combining teleo-reactive planning and reinforcement learning techniques. The aim of this system is to speed up learning by decomposing complex tasks into hierarchies of simple behaviours which can be learnt more easily. Behaviours learnt in this way can subsequently be… (More)

It has previously been established that for Markov learning automata games, the game equilibria are exactly the optimal strategies (Witten, 1977; Wheeler & Narendra, 1986). In this paper, we extend the game theoretic view of reinforcement learning to consider the implications for \group rationality" (Wheeler & Narendra, 1986) in the more general situation… (More)

- Mark D. Pendrith
- Agents
- 2000

In this paper, we report on novel reinforcement learning techniques applied to a real-world application. The problem domain, a traffic engineering application, is formulated as a distributed reinforcement learning problem, where the returns of many agents are simultaneously updating a single shared policy. Learning occurs off-line in a traffic simulator,… (More)

- Mark D. Pendrith
- 1999

In on-line reinforcement learning, often a large number of estimation parameters (e.g. Q-value estimates for 1-step Q-learning) are maintained and dynamically updated as information comes to hand during the learning process. Excessive variance of these estimators can be problematic, resulting in uneven or unstable learning, or even making eeective learning… (More)

- Mark D. Pendrith, Malcolm R. K. Ryan
- ICML
- 1996

This paper argues that for many domains, we can expect credit-assignment methods that use actual returns to be more eeective for reinforcement learning than the more commonly used temporal diierence methods. We present analysis and empirical evidence from three sets of experiments in diierent domains to support this claim. A new algorithm we call C-Trace, a… (More)

- Mark D. Pendrith
- 1997

In reinforcement learning as in many on line search techniques a large number of estimation parameters e g Q value estimates for step Q learning are maintained and dynamically updated as in formation comes to hand during the learning process Excessive variance of these estimators can be problematic resulting in uneven or unstable learning or even making e… (More)

- Mark D. Pendrith
- 1996

There has been much recent interest in the potential of using reinforcement learning techniques for control in autonomous robotic agents. How to implement eeective reinforcement learning in a real-world robotic environment still involves many open questions. Are standard reinforcement learning algorithms like Watkins' Q-learning appropriate , or are other… (More)