Adaptive Critics and the Basal Ganglia

Abstract

One of the most active areas of research in artificial intelligence is the study of learning methods by which " embedded agents " can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop interaction. An embedded agent has to make decisions under time pressure and uncertainty and has to learn without the help of an ever-present knowledgeable teacher. Although the novelty of this emphasis may be inconspicuous to a biologist, animals being the prototypical embedded agents, this emphasis is a significant departure from the more traditional focus in artificial intelligence on reasoning within circumscribed domains removed from the flow of real-world events. One consequence of the embedded agent view is the increasing interest in the learning paradigm called reinforcement learning (RL). Unlike the more widely studied supervised learning systems, which learn from a set of examples of correct input/output behavior, RL systems adjust their behavior with the goal of maximizing the frequency and/or magnitude of the reinforcing events they encounter over time. While the core ideas of modern RL come from theories of animal classical and instrumental conditioning (although the specific term " reinforcement learning " is not used by psychologists), the influence of concepts from artificial intelligence and control theory has produced a collection of computationally powerful learning architectures. Despite similarities between some of these architectures and the structure and function of certain brain regions, relatively little effort has been made to relate these architectures to the nervous system (but see Houk 1992, Klopf 1982, Wickens 1990, and Werbos 1987). In this article I describe the RL system called the actor-critic architecture, giving enough detail so that it can be related to basal-ganglionic circuits and dopamine neurons. Specifically, I focus on a component of this architecture called the adaptive critic, whose behavior seems remarkably similar to that of the dopamine neurons projecting to the stiatum and frontal cortex (Schultz, this workshop). In a companian article in this volume, Houk, Adams, and Barto (1994) present a hypothesis about how the actor-critic architecture might be implemented by the circuits of the basal ganglia and associated brain structures. My explanation of the 1

Extracted Key Phrases

4 Figures and Tables

Showing 1-10 of 22 references

A Model of How the Basal Ganglia Might Generate and Use Neural Signals that Predict Reinforcement

  • J C Houk, J L Adams, A G Barto
  • 1994
Highly Influential
4 Excerpts

Learning in Modular Networks, NPB Technical Report

  • J C Houk
  • 1992

Reinforcement Learning and Adaptive Critic Methods, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches

  • A G Barto
  • 1992
1 Excerpt

Some Learning Tasks from a Control Perspective, in 1990 Lectures in Complex Systems

  • A G Barto
  • 1991
1 Excerpt

Learning and Sequential Decision Making

  • A G Barto, R S Sutton, C J. C H Watkins
  • 1990
3 Excerpts
Showing 1-10 of 260 extracted citations
050100150'97'99'01'03'05'07'09'11'13'15'17
Citations per Year

1,382 Citations

Semantic Scholar estimates that this publication has received between 1,016 and 1,837 citations based on the available data.

See our FAQ for additional information.