Corpus ID: 13528549

Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction

@inproceedings{Sutton2011HordeAS,
  title={Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction},
  author={Richard S. Sutton and Joseph Modayil and Michael Delp and Thomas Degris and Patrick M. Pilarski and Adam White and Doina Precup},
  booktitle={AAMAS},
  year={2011}
}
Maintaining accurate world knowledge in a complex and changing environment is a perennial problem for robots and other artificial intelligence systems. Our architecture for addressing this problem, called Horde, consists of a large number of independent reinforcement learning sub-agents, or demons. Each demon is responsible for answering a single predictive or goal-oriented question about the world, thereby contributing in a factored, modular way to the system's overall knowledge. The questions… Expand
Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning
TLDR
A novel goal-oriented agent called Q-map is proposed that utilizes an autoencoder-like neural network to predict the minimum number of steps towards each coordinate in a single forward pass and it is shown how this network can be efficiently trained with a 3D variant of Q-learning to update the estimates towards all goals at once. Expand
Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation
TLDR
This work shows that a simulated robotic car and a real-world RC car can gather data and train fully autonomously without any human-provided labels beyond those needed to train the detectors, and then at test-time be able to accomplish a variety of different tasks. Expand
Visual Reinforcement Learning with Imagined Goals
TLDR
An algorithm is proposed that acquires general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies, efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques. Expand
Multi-timescale Nexting in a Reinforcement Learning Robot
TLDR
This paper presents results with a robot that learns to next in real time, predicting thousands of features of the world’s state, including all sensory inputs, at timescales from 0.1 to 8 seconds. Expand
BECCA: Reintegrating AI for Natural World Interaction
  • B. Rohrer
  • Computer Science
  • AAAI Spring Symposium: Designing Intelligent Robots
  • 2012
TLDR
A brain-emulating cognition and control architecture that uses a combination of feature creation and model-based reinforcement learning to capture structure in the environment in order to maximize reward is developed. Expand
Modular RL for Real-Time Learning in Physical Environments
  • Per R. Leikanger
  • Computer Science
  • 2019 Conference on Cognitive Computational Neuroscience
  • 2019
TLDR
This work aims to creating a distributed agent inspired by this beautiful complex system comprised of very simple building blocks, and argues that the resulting orientation can be seen as being general for any N -dimensional continuous parameter space, and shows a simple example of how learning can happen individually across state spaces. Expand
Self-organizing maps for storage and transfer of knowledge in reinforcement learning
TLDR
A novel approach for reusing previously acquired knowledge by using it to guide the exploration of an agent while it learns new tasks by employing a variant of the growing self-organizing map algorithm, which is trained using a measure of similarity that is defined directly in the space of the vectorized representations of the value functions. Expand
Multi-timescale nexting in a reinforcement learning robot
TLDR
This paper presents results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds, and extends nexting beyond simple timescale by letting the discount rate be a function of the state. Expand
Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation
TLDR
It is shown that using the Successor Representation can improve sample efficiency and learning speed of GVFs in a continual learning setting where new predictions are incrementally added and learned over time. Expand
Meta-learning for Predictive Knowledge Architectures: A Case Study Using TIDBD on a Sensor-rich Robotic Arm
TLDR
Temporal-Difference Incremental Delta-Bar-Delta is explored -a meta-learning method for temporal-difference (TD) learning which adapts a vector of many step sizes, allowing for simultaneous step size tuning and representation learning. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 28 REFERENCES
Map Learning with Uninterpreted Sensors and Effectors
TLDR
A set of methods by which a learning agent, called a "critter," can learn a sequence of increasingly abstract and powerful interfaces to control a robot whose sensorimotor apparatus and environment are initially unknown are presented. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
TLDR
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments
TLDR
This work presents an unsupervised learning method that allows a robotic agent to identify and represent qualitatively different outcomes of actions and shows that the models acquired by the robot correlate surprisingly well with human models of the environment. Expand
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
TLDR
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning. Expand
Linking Action to Perception in a Humanoid Robot: a Developmental Approach to Grasping
TLDR
A possible sequence of developmental stages which starting from limited knowledge enables the robot to autonomously learn to perform goal directed actions on objects (reaching, pushing, and a simple form of grasping) are presented. Expand
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
Learning in Worlds with Objects
TLDR
The goal is to integrate representational ideas from classical AI with modern learning and uncertain reasoning methods, avoiding logic’s problems of inferential intractability, perceptual ungroundedness, and inability to represent uncertainty. Expand
Neo: learning conceptual knowledge by sensorimotor interaction with an environment
TLDR
It is shown how classes (categories) can be abstracted from these representations, and how this representation might be extended to express physical schemas, general, domain-independent activities that could be the building blocks of concept formation. Expand
Temporal Abstraction in Temporal-difference Networks
TLDR
A new algorithm for intra-option learning in TD networks with function approximation and eligibility traces is introduced with empirical examples of the algorithm's effectiveness and of the greater representational expressiveness of temporally-abstract TD networks. Expand
GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton, Maei and others in which function approximation is much more straightforward. In this paper,Expand
...
1
2
3
...