Fast Online Q(λ)

@article{Wiering2004FastOQ,
  title={Fast Online Q($\lambda$)},
  author={Marco A Wiering and J{\"u}rgen Schmidhuber},
  journal={Machine Learning},
  year={2004},
  volume={33},
  pages={105-115}
}
Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed. 
Reducing the Time Complexity of Goal-Independent Reinforcement Learning
TLDR
This paper presents a technique for reducing the update complexity of CQL to O(|A|) with little impact on performance.
Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes
  • I. Sledge, J. Príncipe
  • Computer Science, Mathematics
    2018 International Joint Conference on Neural Networks (IJCNN)
  • 2018
TLDR
This work introduces a novel means of generalizing agent experiences for large-scale Markov decision processes based on a kernel local linear regression function approximation, which it combines with Q-learning.
Transfer Method for Reinforcement Learning in Same Transition Model -- Quick Approach and Preferential Exploration
TLDR
An effective transfer learning method in same transition model consists of two strategies: approaching to the goal for the selected source task quickly, and exploring states around the goal preferentially.
Preferential exploration method of transfer learning for reinforcement learning in Same Transition Model
TLDR
An effective transfer learning method in same transition model consists of two strategies: approaching to the goal for the selected source task quickly, and exploring states around the goal preferentially.
Concurrent Q‐learning: Reinforcement learning for dynamic goals and environments
TLDR
A powerful new algorithm for reinforcement learning in problems where the goals and also the environment may change, which adapts quickly and intelligently to changes in both the environment and reward structure, and does not suffer interference from training undertaken prior to those changes.
A Comparative Study of Model-Free Reinforcement Learning Approaches
This study explores and compares three model-free learning methods, namely, deep Q-networks (DQN), dueling deep Q-networks (DDQN) and state-action-reward-state-action (SARSA), while detailing the
SSPQL: Stochastic shortest path-based Q-learning
TLDR
A stochastic shortest path-based Q-learning (SSPQL) is proposed, combining a stoChastic shortest Path-finding method with Q- learning, a well-known model-free RL method, to solve the problem of slow convergence when deriving an optimum policy in practical applications.
Effective Reuse Method for Transfer Learning in Actor-critic
TLDR
This paper proposes the reuse the policy method based on the proposed selection method of actor-critic method, which is one of major reinforcement learning algorithms, by a transfer learning.
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
TLDR
This paper addresses the general problem of reinforcement learning (RL) in partially observable environments, and describes RNN-based AIs (RNNAIs) designed to do the same, guided by algorithmic information theory.
On-policy concurrent reinforcement learning
TLDR
It is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions, and it is shown, experimentally, that the new techniques can learn better policies than the previous algorithms during some phases of the exploration.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Speeding up Q(lambda)-Learning
TLDR
The faster Q(λ)-learning algorithm is based on the observation that Q-value updates may be postponed until they are needed, and its worst case complexity is bounded by the number of actions.
Speeding up Q(A)-learning
TLDR
The faster algorithm is based on the observation that Q-value updates may be postponed until they are needed, and its worst case complexity is bounded by the number of actions.
Incremental multi-step Q-learning
TLDR
A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.
Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning
TLDR
Examination of the issues of the efficient and general implementation of TD(λ) for arbitrary λ, for use with reinforcement learning algorithms optimizing the discounted sum of rewards suggests that using λ > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.
Q-learning
TLDR
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
On-line Q-learning using connectionist systems
TLDR
Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.
Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning
TLDR
The TTD (Truncated Temporal Di erences) procedure is proposed as an alternative, that indeed only approximates TD( ), but requires very little computation per action and can be used with arbitrary function representation methods.
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms
TLDR
The complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks is analyzed and it is proved that the algorithms are tractable with only a simple change in the reward structure ("penalizing the agent for action executions") or in the initialization of the values that they maintain.
The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms
TLDR
The complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks is analyzed to prove that the algorithms are tractable with only a simple change in the reward structure ("penalizing the agent for action executions") or in the initialization of the values that they maintain.
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
TLDR
It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.
...
...