Corpus ID: 235417170

Preferential Temporal Difference Learning

@article{Anand2021PreferentialTD,
  title={Preferential Temporal Difference Learning},
  author={N. Anand and Doina Precup},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.06508}
}
Abstract Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are visited. When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states. However, it may be interesting, when computing updates, to take into account other information than whether a state is… Expand

References

SHOWING 1-10 OF 36 REFERENCES
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
TLDR
It is shown that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training. Expand
A First Empirical Study of Emphatic Temporal Difference Learning
TLDR
This paper's experiments found that each method converged to a characteristic asymptotic level of error, with ETD better than linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. Expand
Incremental Off-policy Reinforcement Learning Algorithms
TLDR
This dissertation addresses and overcome two shortcomings of off-policy TD algorithms that preclude a widespread use in knowledge representation, and shows that the ratios can be eliminated from the updates by varying the amount of bootstrapping, a sophisticated technique for allowing a spectrum of multi-step TD algorithms. Expand
Learning to Predict by the Methods of Temporal Differences
  • R. Sutton
  • Computer Science
  • Machine Learning
  • 2005
TLDR
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods. Expand
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
Matrix iterative analysis, volume 27
  • Springer Science & Business Media,
  • 1999
Neuro-Dynamic Programming
From the Publisher: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application ofExpand
TD Models: Modeling the World at a Mixture of Time Scales
TLDR
This work establishes the theoretical foundations of multi-scale models and derive TD algorithms for learning them and treats only the prediction problem--that of learning a model and value function for the case of fixed agent behavior. Expand
Temporal credit assignment in reinforcement learning
Forethought and Hindsight in Credit Assignment
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate newExpand
...
1
2
3
4
...