On temporal difference algorithms for continuous systems

  title={On temporal difference algorithms for continuous systems},
  author={Alexandre Donz{\'e}},
This article proposes a general, intuitive and rigorous framework for designing temporal differences algorithms to solve optimal control problems in continuous time and space. Within this framework, we derive a version of the classical TD(λ) algorithm as well as a new TD algorithm which is similar, but designed to be more accurate and to converge as fast as TD(λ) for the best values of λ without the burden of finding these values. 
Temporal-difference learning for online reachability analysis
This work proposes a novel online reachability update algorithm based on Temporal-Difference learning that is computationally more efficient and outperforms standard reachability-based controllers when it comes to other (non-safety) objectives. Expand
Goal-Oriented Control of Self-Organizing Behavior in Autonomous Robots
We study adaptive control algorithms within a dynamical systems approach for autonomous robots that cause the self-organization of coordinated behaviors without specific goals or particularExpand
Hybrid Systems 1.1 Introduction
    The research on hybrid systems at Verimag has as a major objective to export some ideas and insights originating from computer science toward other domains of applied science and engineering that doExpand


    Temporal Difference Learning in Continuous Time and Space
    • K. Doya
    • Mathematics, Computer Science
    • NIPS
    • 1995
    A continuous-time, continuous-state version of the temporal difference (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks andExpand
    Reinforcement Learning in Continuous Time and Space
    • K. Doya
    • Mathematics, Medicine
    • Neural Computation
    • 2000
    This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi-Bellman (HJB)Expand
    A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions
    • R. Munos
    • Mathematics, Computer Science
    • Machine Learning
    • 2004
    A general convergence theorem is derived for RL algorithms when one uses only “approximations” of the initial data, which can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics, and based on FE or FD discretization methods. Expand
    Relaxed dynamic programming in switching systems
    In order to simplify computational methods based on dynamic programming, a relaxed procedure based on upper and lower bounds of the optimal cost was recently introduced. The convergence properties ofExpand
    Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
    This paper describes variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-tree and derives a splitting criterion that allows one cell to efficiently take into account its impact on other cells when deciding whether to split. Expand
    On the Convergence of Optimistic Policy Iteration
    • J. Tsitsiklis
    • Mathematics, Computer Science
    • J. Mach. Learn. Res.
    • 2002
    A finite-state Markov decision problem is considered and the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection is established. Expand
    Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
    It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ. Expand
    Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur)
    The continuous TD(lambda) algorithm is refined to handle situations with discontinuous states and controls, and the vario-eta algorithm is proposed as a simple but efficient method to perform gradient descent. Expand
    Temporal Difference Learning and TD-Gammon
    • G. Tesauro
    • Computer Science
    • J. Int. Comput. Games Assoc.
    • 1995
    TD-GAMMON is a neural network that trains itself to be an evaluation function for the game of backgammon by playing against itself and learning from the outcome. Expand
    Reinforcement Learning: An Introduction
    This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand