Learn More
Many adaptive neural network theories are based on neuronlike adaptive elements that can behave as single unit analogs of associative conditioning. In this article we develop a similar adaptive element, but one which is more closely in accord with the facts of animal learning theory than elements commonly studied in adaptive network research. We suggest(More)
One of the most active areas of research in artificial intelligence is the study of learning methods by which " embedded agents " can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop(More)
for helping to clarify the relationships between heuristic search and control. We thank Rich Sutton, Chris Watkins, Paul Werbos, and Ron Williams for sharing their fundamental insights into this subject through numerous discussions , and we further thank Rich Sutton for rst making us aware of Korf's research and for his very thoughtful comments on the(More)
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather(More)
This paper describes the application of reinforcement learning (RL) to the dicult real world problem of elevator dispatching. The elevator domain poses a combination of challenges not seen in most RL research to date. Elevator systems operate in continuous state spaces and in continuous time as discrete event dynamic systems. Their states are not fully(More)
We introduce two new temporal diffence (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this(More)