# Fast Online Q(λ)

@article{Wiering2004FastOQ, title={Fast Online Q($\lambda$)}, author={Marco A Wiering and J{\"u}rgen Schmidhuber}, journal={Machine Learning}, year={2004}, volume={33}, pages={105-115} }

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

## 73 Citations

Reducing the Time Complexity of Goal-Independent Reinforcement Learning

- Computer Science
- 2004

This paper presents a technique for reducing the update complexity of CQL to O(|A|) with little impact on performance.

Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

- Computer Science, Mathematics2018 International Joint Conference on Neural Networks (IJCNN)
- 2018

This work introduces a novel means of generalizing agent experiences for large-scale Markov decision processes based on a kernel local linear regression function approximation, which it combines with Q-learning.

Transfer Method for Reinforcement Learning in Same Transition Model -- Quick Approach and Preferential Exploration

- Computer Science2011 10th International Conference on Machine Learning and Applications and Workshops
- 2011

An effective transfer learning method in same transition model consists of two strategies: approaching to the goal for the selected source task quickly, and exploring states around the goal preferentially.

Preferential exploration method of transfer learning for reinforcement learning in Same Transition Model

- Computer ScienceThe 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems
- 2012

An effective transfer learning method in same transition model consists of two strategies: approaching to the goal for the selected source task quickly, and exploring states around the goal preferentially.

SSPQL: Stochastic shortest path-based Q-learning

- Computer Science
- 2011

A stochastic shortest path-based Q-learning (SSPQL) is proposed, combining a stoChastic shortest Path-finding method with Q- learning, a well-known model-free RL method, to solve the problem of slow convergence when deriving an optimum policy in practical applications.

Effective Reuse Method for Transfer Learning in Actor-critic

- Computer Science
- 2010

This paper proposes the reuse the policy method based on the proposed selection method of actor-critic method, which is one of major reinforcement learning algorithms, by a transfer learning.

On-policy concurrent reinforcement learning

- Computer ScienceJ. Exp. Theor. Artif. Intell.
- 2004

It is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions, and it is shown, experimentally, that the new techniques can learn better policies than the previous algorithms during some phases of the exploration.

Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance

- Computer ScienceTAROS
- 2013

This paper presents a comparative analysis of three Reinforcement Learning algorithms and their heuristically-accelerated variants (HAQL, HAQ(\(\lambda \)) and HAQS) where heuristics bias action selection, thus speeding up the learning.

Incremental topological reinforcement learning agent in non-structured environments

- Computer Science2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583)
- 2004

A new reinforcement learning model, the incremental topological reinforcement learning agent (ITRLA), designed to guide agent navigation in non-structured environments, considering two common situations: insertion of noise during state estimation and changes in environment structure.

Heuristically-Accelerated Multiagent Reinforcement Learning

- Computer ScienceIEEE Transactions on Cybernetics
- 2014

The results show that even the most straightforward heuristics can produce virtually optimal action selection policies in much fewer episodes, significantly improving the performance of the HAMRL over vanilla RL algorithms.

## References

SHOWING 1-10 OF 32 REFERENCES

Speeding up Q(lambda)-Learning

- Computer ScienceECML
- 1998

The faster Q(λ)-learning algorithm is based on the observation that Q-value updates may be postponed until they are needed, and its worst case complexity is bounded by the number of actions.

Incremental multi-step Q-learning

- Computer ScienceMachine Learning
- 2004

A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

- Computer ScienceJ. Artif. Intell. Res.
- 1995

Examination of the issues of the efficient and general implementation of TD(λ) for arbitrary λ, for use with reinforcement learning algorithms optimizing the discounted sum of rewards suggests that using λ > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.

Q-learning

- Computer ScienceMachine Learning
- 2004

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

On-line Q-learning using connectionist systems

- Computer Science
- 1994

Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.

Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning

- Computer Science
- 1995

The TTD (Truncated Temporal Di erences) procedure is proposed as an alternative, that indeed only approximates TD( ), but requires very little computation per action and can be used with arbitrary function representation methods.

The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms

- Computer ScienceMachine Learning
- 2005

The complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks is analyzed to prove that the algorithms are tractable with only a simple change in the reward structure ("penalizing the agent for action executions") or in the initialization of the values that they maintain.

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

- Computer ScienceNIPS
- 1995

It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.

Technical Note: Q-Learning

- Computer ScienceMachine Learning
- 2004

A convergence theorem is presented and proves that Q -learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

Reinforcement Learning with Replacing Eligibility Traces

- Computer ScienceMachine Learning
- 2005

This paper introduces a new kind of eligibility trace, the replacing trace, analyze it theoretically, and shows that it results in faster, more reliable learning than the conventional trace, and significantly improves performance and reduces parameter sensitivity on the "Mountain-Car" task.