Experimental analysis of eligibility traces strategies in temporal difference learning

  title={Experimental analysis of eligibility traces strategies in temporal difference learning},
  author={Jinsong Leng and Lakhmi C. Jain and Colin Fyfe},
  journal={Int. J. Knowl. Eng. Soft Data Paradigms},
Temporal difference (TD) learning is a model-free reinforcement learning technique, which adopts an infinite horizon discount model and uses an incremental learning technique for dynamic programming. The state value function is updated in terms of sample episodes. Utilising eligibility traces is a key mechanism in enhancing the rate of convergence. TD(λ) represents the use of eligibility traces by introducing the parameter λ. However, the underlying mechanism of eligibility traces with an… 

Figures and Tables from this paper

Intelligent Inventory Control: Is Bootstrapping Worth Implementing?

The results show questionable benefit of bootstrapping when applied to inventory problems, and significance tests could not confirm thatbootstrapping had statistically significantly reduced costs of inventory controlled by a RL agent.



Convergence Analysis on Approximate Reinforcement Learning

The aim of this paper is to propose a methodology for analysing the performance for adaptively selecting a set of optimal parameter values in TD(λ) learning algorithm.

Reinforcement Learning with Replacing Eligibility Traces

This paper introduces a new kind of eligibility trace, the replacing trace, analyze it theoretically, and shows that it results in faster, more reliable learning than the conventional trace, and significantly improves performance and reduces parameter sensitivity on the "Mountain-Car" task.

Evolutionary Function Approximation for Reinforcement Learning

A fully implemented instantiation of evolutionary function approximation is presented which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method, and the resulting NEAT+Q algorithm automatically discovers effective representations for neural network function approximators.

Incremental multi-step Q-learning

A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

TD(λ) Converges with Probability 1

This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.


This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

Reinforcement Learning of Competitive Skills with Soccer Agents

The reinforcement learning algorithms are adopted to verify goal-oriented agents' competitive and cooperative learning abilities for decision making and the function approximation technique known as tile coding (TC) is used to generate value functions, which can avoid the value function growing exponentially with the number of the state values.

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning

This book examines the mathematical governing principles of simulation-based optimization, thereby providing the reader with the ability to model relevant real-life problems using these techniques, and outlines the computational technology underlying these methods.

Learning to Predict by the Methods of Temporal Differences

This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.