Reinforcement Learning: An Introduction

  title={Reinforcement Learning: An Introduction},
  author={Richard S. Sutton and Andrew G. Barto},
  journal={IEEE Transactions on Neural Networks},
  • R. Sutton, A. Barto
  • Published 2005
  • Computer Science
  • IEEE Transactions on Neural Networks
Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. [] Key Method Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view…
Use of Reinforcement Learning as a Challenge: A Review
This paper discusses its basic model, the optimal policies used in RL, the main reinforcement optimal policy that are used to reward the agent including model free and model based policies, and some of the future research scope in Reinforcement Learning.
Algorithms for Reinforcement Learning
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Control Optimization with Reinforcement Learning
This chapter focuses on a relatively new methodology called reinforcement learning (RL), a form of simulation-based dynamic programming, primarily used for solving Markov and semi-Markov decision problems.
Reinforcement Learning: A Technical Introduction – Part I
The paper offers an opinionated introduction in the algorithmic advantages and drawbacks of several algorithmic approaches to provide algorithmic design options.
Reinforcement Learning and Its Relationship to Supervised Learning
This chapter discusses stochastic sequential decision processes from the perspective of Machine Learning, focussing on reinforcement learning and its relationship to the more commmonly studied supervised learning problems.
Derivative-Free Reinforcement Learning: A Review
Methods of derivative-free reinforcement learning to date are summarized, and the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods are organized.
Online learning of shaping rewards in reinforcement learning
Influence Value Q-Learning: A Reinforcement Learning Algorithm for Multi Agent Systems 1
The use of multi-agent systems became popular in the solution of computacional problems like e-commerce, scheduling in transportation problems, estimation of energy demand, content based image retrieval, and others.
Opposition-Based Reinforcement Learning
  • H. Tizhoosh
  • Computer Science
    J. Adv. Comput. Intell. Intell. Informatics
  • 2006
Opposition-based reinforcement learning, inspired by opposition-based learning, is introduced, to speed up convergence by Considering opposite actions simultaneously enables individual states to be updated more than once shortening exploration and expediting convergence.
Algorithms and Representations for Reinforcement Learning
This thesis introduces a new class of Reinforcement Learning algorithms, which leverage the power of a statistical set of tools known as Gaussian Processes, and offers viable solutions to some of the major limitations of current Rein reinforcement Learning methods.


Reinforcement Learning: A Survey
Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.
Self-improving reactive agents based on reinforcement learning, planning and teaching
This paper compares eight reinforcement learning frameworks: Adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning and two extensions are experience replay, learning action models for planning, and teaching.
Problem solving with reinforcement learning
This thesis is concerned with practical issues surrounding the application of reinforcement learning techniques to tasks that take place in high dimensional continuous state-space environments. In
Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons
This paper describes the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm, which is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received.
Importance sampling for reinforcement learning with multiple objectives
This thesis considers three complications that arise from applying reinforcement learning to a real-world application, and employs importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data.
Adaptive Confidence and Adaptive Curiosity
This paper introduces ways for modelling the reliability of the outputs of adaptive predictors, and it describes more sophisticated and sometimes more ecient methods for their adaptive construction by on-line state space exploration.
Modular on-line function approximation for scaling up reinforcement learning
This dissertation extends existing ways of scaling up reinforcement learning methods and proposes several new approaches that can be used to enable reinforcement learning agents to acquire context-dependent evaluation functions and policies.
Gradient Descent for General Reinforcement Learning
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm.
Adaptive Critics and the Basal Ganglia
One consequence of the embedded agent view is the increasing interest in the learning paradigm called reinforcement learning (RL).