Fast Online Q(λ)

@article{Wiering2004FastOQ,
  title={Fast Online Q($\lambda$)},
  author={Marco A Wiering and J{\"u}rgen Schmidhuber},
  journal={Machine Learning},
  year={2004},
  volume={33},
  pages={105-115}
}
Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed. 
Reducing the Time Complexity of Goal-Independent Reinforcement Learning
TLDR
This paper presents a technique for reducing the update complexity of CQL to O(|A|) with little impact on performance.
Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes
  • I. Sledge, J. Príncipe
  • Computer Science, Mathematics
    2018 International Joint Conference on Neural Networks (IJCNN)
  • 2018
TLDR
This work introduces a novel means of generalizing agent experiences for large-scale Markov decision processes based on a kernel local linear regression function approximation, which it combines with Q-learning.
Transfer Method for Reinforcement Learning in Same Transition Model -- Quick Approach and Preferential Exploration
TLDR
An effective transfer learning method in same transition model consists of two strategies: approaching to the goal for the selected source task quickly, and exploring states around the goal preferentially.
Preferential exploration method of transfer learning for reinforcement learning in Same Transition Model
TLDR
An effective transfer learning method in same transition model consists of two strategies: approaching to the goal for the selected source task quickly, and exploring states around the goal preferentially.
SSPQL: Stochastic shortest path-based Q-learning
TLDR
A stochastic shortest path-based Q-learning (SSPQL) is proposed, combining a stoChastic shortest Path-finding method with Q- learning, a well-known model-free RL method, to solve the problem of slow convergence when deriving an optimum policy in practical applications.
Effective Reuse Method for Transfer Learning in Actor-critic
TLDR
This paper proposes the reuse the policy method based on the proposed selection method of actor-critic method, which is one of major reinforcement learning algorithms, by a transfer learning.
On-policy concurrent reinforcement learning
TLDR
It is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions, and it is shown, experimentally, that the new techniques can learn better policies than the previous algorithms during some phases of the exploration.
Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance
TLDR
This paper presents a comparative analysis of three Reinforcement Learning algorithms and their heuristically-accelerated variants (HAQL, HAQ(\(\lambda \)) and HAQS) where heuristics bias action selection, thus speeding up the learning.
Incremental topological reinforcement learning agent in non-structured environments
  • A. Braga, A. Araujo, J. Wyatt
  • Computer Science
    2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583)
  • 2004
TLDR
A new reinforcement learning model, the incremental topological reinforcement learning agent (ITRLA), designed to guide agent navigation in non-structured environments, considering two common situations: insertion of noise during state estimation and changes in environment structure.
Heuristically-Accelerated Multiagent Reinforcement Learning
TLDR
The results show that even the most straightforward heuristics can produce virtually optimal action selection policies in much fewer episodes, significantly improving the performance of the HAMRL over vanilla RL algorithms.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Speeding up Q(lambda)-Learning
TLDR
The faster Q(λ)-learning algorithm is based on the observation that Q-value updates may be postponed until they are needed, and its worst case complexity is bounded by the number of actions.
Incremental multi-step Q-learning
TLDR
A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.
Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning
TLDR
Examination of the issues of the efficient and general implementation of TD(λ) for arbitrary λ, for use with reinforcement learning algorithms optimizing the discounted sum of rewards suggests that using λ > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.
Q-learning
TLDR
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
On-line Q-learning using connectionist systems
TLDR
Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.
Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning
TLDR
The TTD (Truncated Temporal Di erences) procedure is proposed as an alternative, that indeed only approximates TD( ), but requires very little computation per action and can be used with arbitrary function representation methods.
The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms
TLDR
The complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks is analyzed to prove that the algorithms are tractable with only a simple change in the reward structure ("penalizing the agent for action executions") or in the initialization of the values that they maintain.
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
TLDR
It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.
Technical Note: Q-Learning
TLDR
A convergence theorem is presented and proves that Q -learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Reinforcement Learning with Replacing Eligibility Traces
TLDR
This paper introduces a new kind of eligibility trace, the replacing trace, analyze it theoretically, and shows that it results in faster, more reliable learning than the conventional trace, and significantly improves performance and reduces parameter sensitivity on the "Mountain-Car" task.
...
...