Andrew G. Barto

Learn More
The reinforcement learning (RL) problem is the challenge of artificial intelligence in a microcosm; how can we build an agent that can plan, learn, perceive, and act in a complex world? There’s a great new book on the market that lays out the conceptual and algorithmic foundations of this exciting area. RL pioneers Rich Sutton and Andy Barto have published(More)
Many adaptive neural network theories are based on neuronlike adaptive elements that can behave as single unit analogs of associative conditioning. In this article we develop a similar adaptive element, but one which is more closely in accord with the facts of animal learning theory than elements commonly studied in adaptive network research. We suggest(More)
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather(More)
One of the most active areas of research in artificial intelligence is the study of learning methods by which “embedded agents” can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop(More)
Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems(More)
Learning methods based on dynamic programming (DP) are receiving increasing attention in arti cial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We(More)
We introduce two new temporal diffence (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this(More)
Increasingly many wireless sensor network deployments are using harvested environmental energy to extend system lifetime. Because the temporal profiles of such energy sources exhibit great variability due to dynamic weather patterns, an important problem is designing an adaptive duty-cycling mechanism that allows sensor nodes to maintain their power supply(More)