- Published 1998 in Adaptive computation and machine learning

The reinforcement learning (RL) problem is the challenge of artificial intelligence in a microcosm; how can we build an agent that can plan, learn, perceive, and act in a complex world? There’s a great new book on the market that lays out the conceptual and algorithmic foundations of this exciting area. RL pioneers Rich Sutton and Andy Barto have published Reinforcement Learning: An Introduction, providing a highly accessible starting point for interested students, researchers, and practitioners. In the RL framework, an agent acts in an environment whose state it can sense, and occasionally receives some penalty or reward based on its state and action. Its learning task is to select actions to maximize its reward over the long haul; this requires, not only choosing actions that are associated with high reward in the current state, but “thinking ahead” by choosing actions that will lead the agents to more lucrative parts of the state space. While there are many ways to attack this problem, the paradigm described in the book is to construct a value function that evaluates the “goodness” of different situations. In particular, the value of a state is the long-term reward that can be attained starting from the state if actions are chosen optimally. Recent research has produced a flurry of algorithms for learning value functions, theoretical insights into their power and limitations, and a series of fielded applications. The authors have done a wonderful job of boiling down disparate and complex RL algorithms to a set of fundamental components, then showing how these components work together. The differences between Dynamic Programming, Monte Carlo Methods, and Temporal-Difference Learning are teased apart, then tied back together in a new, unified way. Innovations such as “backup diagrams”, which decorate the book cover, help convey the power and excitement behind RL methods to both novices and RL veterans like us. The book consists of three parts, one dedicated to the problem description, and two others to a range of reinforcement learning algorithms, their analysis, and related research issues. We enthusiastically applaud the authors’ decision to articulate the problem addressed in the book before talking in length about its various solutions. After all, a thorough discussion of the problem is necessary to understand the aims and scope of reinforcement learning research, let alone for novices in the field. At 85 pages in length, however, one might wonder what it is about the reinforcement learning problem that its description deserves (or requires?) twice as many pages as the typical journal paper. Is the reinforcement learning problem so complicated that it takes that long to describe and discuss it? In truth, the Part I does much more than just pose the problem. Chapter 1 contains a highly informal introduction into the broad problem domain: learning to select actions while interacting with an environment in order to achieve long-term goals. The example of tic-tac-toe makes concepts such as reward, value functions, and the exploration-exploitation dilemma feel natural—all concepts that find a more mathematical treatment later in the book. The first chapter also provides an invaluable description of the history of reinforcement learn-

Citations per Year

Semantic Scholar estimates that this publication has **25,413** citations based on the available data.

See our **FAQ** for additional information.

Showing 1-10 of 11,852 extracted citations

Highly Influenced

5 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

4 Excerpts

Highly Influenced

9 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

10 Excerpts

Highly Influenced

6 Excerpts

Highly Influenced

15 Excerpts

Highly Influenced

20 Excerpts

@article{Sutton1998ReinforcementL,
title={Reinforcement learning - an introduction},
author={Richard S. Sutton and Andrew G. Barto},
journal={IEEE Trans. Neural Networks},
year={1998},
volume={9},
pages={1054-1054}
}