Learning to Act Using Real-Time Dynamic Programming

@article{Barto1995LearningTA,
  title={Learning to Act Using Real-Time Dynamic Programming},
  author={Andrew G. Barto and Steven J. Bradtke and Satinder Singh},
  journal={Artif. Intell.},
  year={1995},
  volume={72},
  pages={81-138}
}
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. [...] Key Method RTDP generalizes Korf''s Learning-Real-Time-A algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several different classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins'' Q-Learning algorithm. A…Expand
Incremental dynamic programming for on-line adaptive optimal control
TLDR
This dissertation expands the theoretical and empirical understanding of IDP algorithms and increases their domain of practical application, and proves convergence of a DP-based reinforcement learning algorithm to the optimal policy for any continuous domain. Expand
Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms
TLDR
This paper presents a convergence result for indirect adaptive asynchronous value iteration algorithms for the case in which a look-up table is used to store the value function and implies convergence of several existing reinforcement learning algorithms. Expand
Reinforcement Learning and Its Relationship to Supervised Learning
TLDR
This chapter discusses stochastic sequential decision processes from the perspective of Machine Learning, focussing on reinforcement learning and its relationship to the more commmonly studied supervised learning problems. Expand
Learning for Adaptive Real-time Search
TLDR
A novel algorithm is proposed that learns a heuristic function to be used specifically with a lookahead-based policy, selects the lookahead depth adaptively in each state, and gives the user control over the trade-off between exploration and exploitation. Expand
Near-optimal intelligent control for continuous set-point regulator problems via approximate dynamic programming
Optimization theory provides a framework for determining the best decisions or actions with respect to some mathematical model of a process. This research focuses on learning to act in a near-optimalExpand
Ju l 2 00 4 Learning for Adaptive Real-time Search
Abstract. Real-time heuristic search is a popular model of acting and learning in intelligent autonomous agents. Learning real-time search agents improve their performance over time by acquiring andExpand
Reinforcement Learning and Dynamic Programming
TLDR
A brief account of the methods being developed by reinforcement learning researchers is provided, what is novel about them, and what their advantages might be over classical applications of dynamic programming to large-scale stochastic optimal control problems are suggested. Expand
Large-scale dynamic optimization using teams of reinforcement learning agents
TLDR
This dissertation uses a team of RL agents, each of which is responsible for controlling one elevator car, to demonstrate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility. Expand
Learning to Solve Markovian Decision Processes
TLDR
This dissertation establishes a novel connection between stochastic approximation theory and RL that provides a uniform framework for understanding all the different RL algorithms that have been proposed to date and highlights a dimension that clearly separates all RL research from prior work on DP. Expand
Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards
TLDR
An approach that links MF and MB learning in a new way: via the reward function, where knowledge of the world doesn’t just provide a source of simulated experience for training the authors' instincts; it shapes the rewards that those instincts latch onto. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 163 REFERENCES
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
TLDR
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning. Expand
Reinforcement Learning Applied to Linear Quadratic Regulation
TLDR
An algorithm based on Q-learning is described that is proven to converge to the optimal controller for a large class of LQR problems, an important class of control problems involving continuous state and action spaces and requiring a simple type of non-linear function approximator. Expand
On the Computational Economics of Reinforcement Learning
TLDR
It is suggested that given a fixed amount of computational power available per control action, it may be better to use a direct reinforcement learning method augmented with indirect techniques than to devote all available resources to a computationally costly indirect method. Expand
A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING
Combining elements of the theory of dynamic programming with features appropriate for on-line learning has led to an approach Watkins has called incre-mental dynamic programming. Here we adopt thisExpand
Self-improving reactive agents: case studies of reinforcement learning frameworks
TLDR
This paper describes the learning agents and their performance, and summarizes the learning algorithms and the lessons I learned from this study. Expand
Planning by Incremental Dynamic Programming
TLDR
The basic results and ideas of dynamic programming as they relate most directly to the concerns of planning in AI are presented, which form the theoretical basis for the incremental planning methods used in the integrated architecture Dyna. Expand
Efficient memory-based learning for robot control
TLDR
A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered, thus permitting very quick predictions of the e ects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. Expand
Reinforcement learning is direct adaptive optimal control
TLDR
An emerging deeper understanding of neural network reinforcement learning methods is summarized that is obtained by viewing them as a synthesis of dynamic programming and stochastic approximation methods. Expand
Adaptive Confidence and Adaptive Curiosity
Much of the recent research on adaptive neuro-control and reinforcement learning focusses on systems with adaptivèworld models'. Previous approaches, however, do not address the problem of modellingExpand
Learning in embedded systems
TLDR
This dissertation addresses the problem of designing algorithms for learning in embedded systems using Sutton's techniques for linear association and reinforcement comparison, while the interval estimation algorithm uses the statistical notion of confidence intervals to guide its generation of actions. Expand
...
1
2
3
4
5
...