Reinforcement learning - an introduction


The reinforcement learning (RL) problem is the challenge of artificial intelligence in a microcosm; how can we build an agent that can plan, learn, perceive, and act in a complex world? There’s a great new book on the market that lays out the conceptual and algorithmic foundations of this exciting area. RL pioneers Rich Sutton and Andy Barto have published Reinforcement Learning: An Introduction, providing a highly accessible starting point for interested students, researchers, and practitioners. In the RL framework, an agent acts in an environment whose state it can sense, and occasionally receives some penalty or reward based on its state and action. Its learning task is to select actions to maximize its reward over the long haul; this requires, not only choosing actions that are associated with high reward in the current state, but “thinking ahead” by choosing actions that will lead the agents to more lucrative parts of the state space. While there are many ways to attack this problem, the paradigm described in the book is to construct a value function that evaluates the “goodness” of different situations. In particular, the value of a state is the long-term reward that can be attained starting from the state if actions are chosen optimally. Recent research has produced a flurry of algorithms for learning value functions, theoretical insights into their power and limitations, and a series of fielded applications. The authors have done a wonderful job of boiling down disparate and complex RL algorithms to a set of fundamental components, then showing how these components work together. The differences between Dynamic Programming, Monte Carlo Methods, and Temporal-Difference Learning are teased apart, then tied back together in a new, unified way. Innovations such as “backup diagrams”, which decorate the book cover, help convey the power and excitement behind RL methods to both novices and RL veterans like us. The book consists of three parts, one dedicated to the problem description, and two others to a range of reinforcement learning algorithms, their analysis, and related research issues. We enthusiastically applaud the authors’ decision to articulate the problem addressed in the book before talking in length about its various solutions. After all, a thorough discussion of the problem is necessary to understand the aims and scope of reinforcement learning research, let alone for novices in the field. At 85 pages in length, however, one might wonder what it is about the reinforcement learning problem that its description deserves (or requires?) twice as many pages as the typical journal paper. Is the reinforcement learning problem so complicated that it takes that long to describe and discuss it? In truth, the Part I does much more than just pose the problem. Chapter 1 contains a highly informal introduction into the broad problem domain: learning to select actions while interacting with an environment in order to achieve long-term goals. The example of tic-tac-toe makes concepts such as reward, value functions, and the exploration-exploitation dilemma feel natural—all concepts that find a more mathematical treatment later in the book. The first chapter also provides an invaluable description of the history of reinforcement learn-

DOI: 10.1109/TNN.1998.712192

Extracted Key Phrases

Citations per Year

24,593 Citations

Semantic Scholar estimates that this publication has 24,593 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Sutton1998ReinforcementL, title={Reinforcement learning - an introduction}, author={Richard S. Sutton and Andrew G. Barto}, booktitle={Adaptive computation and machine learning}, year={1998} }