Reinforcement Learning: A Survey

@article{Kaelbling1996ReinforcementLA,
  title={Reinforcement Learning: A Survey},
  author={Leslie Pack Kaelbling and Michael L. Littman and Andrew W. Moore},
  journal={ArXiv},
  year={1996},
  volume={cs.AI/9605103}
}
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in… 
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Reinforcement Learning: A Review from a Machine Learning Perspective
TLDR
This paper provides the overview of Reinforcement Learning from Machine learning perspective, and presents nature of RL problems, with focus on some influential model free RL algorithms, challenges and recent trends in theory and practice.
Use of Reinforcement Learning as a Challenge: A Review
TLDR
This paper discusses its basic model, the optimal policies used in RL, the main reinforcement optimal policy that are used to reward the agent including model free and model based policies, and some of the future research scope in Reinforcement Learning.
Control Optimization with Reinforcement Learning
TLDR
This chapter focuses on a relatively new methodology called reinforcement learning (RL), a form of simulation-based dynamic programming, primarily used for solving Markov and semi-Markov decision problems.
Algorithms for Reinforcement Learning
TLDR
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions
TLDR
A framework for the presentation of available methods of reinforcement learning is provided that is informative enough and simple to follow for the new researchers and academics in this domain considering the latest concerns.
REINFORCEMENT LEARNING IN COMPLEX REAL WORLD DOMAINS: A REVIEW
TLDR
This paper gives an introduction to Reinforcement Learning, discusses its basic model and system structure, and discusses the problems faced while implementing RL algorithms in complex real world domains.
A survey of inverse reinforcement learning techniques
TLDR
The original IRL algorithms and its close variants, as well as their recent advances are reviewed and compared.
Reinforcement Learning in R
TLDR
This paper demonstrates how to perform reinforcement learning in R and introduces the ReinforcementLearning package, which provides a remarkably flexible framework and is easily applied to a wide range of different problems.
A survey of reinforcement learning in relational domains
TLDR
The aim is to give a complete survey of the available literature, of the underlying motivations and of the implications if the new methods for learning in large, relational and probabilistic environments.
...
...

References

SHOWING 1-10 OF 229 REFERENCES
Reinforcement learning for robots using neural networks
TLDR
This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.
Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons
TLDR
This paper describes the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm, which is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received.
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
TLDR
This paper studies three connectionist approaches which learn to use history to handle perceptual aliasing: the window-Q, recurrent- Q, and recurrent-model architectures.
Learning in embedded systems
TLDR
This dissertation addresses the problem of designing algorithms for learning in embedded systems using Sutton's techniques for linear association and reinforcement comparison, while the interval estimation algorithm uses the statistical notion of confidence intervals to guide its generation of actions.
Eecient Reinforcement Learning
TLDR
A new formal model for studying reinforcement learning, based on Valiant's PAC framework, that requires the learner to produce a policy whose expected value from the initial state is "-close to that of the optimal policy, with probability no less than 1 ?".
On-line Q-learning using connectionist systems
TLDR
Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.
Generalization and Scaling in Reinforcement Learning
TLDR
This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.
Efficient reinforcement learning
TLDR
A new formal model for studying reinforcement learning, based on Valiant's PAC framework, that requires the learner to produce a policy whose expected value from the initial state is ε-close to that of the optimal policy, with probability no less than 1−δ.
Continual learning in reinforcement environments
TLDR
CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development is proposed, described, tested, and evaluated in this dissertation and generates a hierarchical, higher-order neural network that can be used for predicting context-dependent temporal sequences and can learn sequential-task benchmarks more than two orders of magnitude faster than competing neural-network systems.
...
...