Introspective Agents: Confidence Measures for General Value Functions

  title={Introspective Agents: Confidence Measures for General Value Functions},
  author={Craig Sherstan and Adam White and Marlos C. Machado and P. Pilarski},
Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions. While such adaptive agents may leverage engineered knowledge, they will require the capacity to construct and evaluate knowledge themselves from their own experience in a bottom-up, constructivist fashion. This position paper builds on the idea of encoding knowledge as temporally extended predictions through the use of general value functions. Prior work has focused on learning… Expand
What's a Good Prediction? Issues in Evaluating General Value Functions Through Error
This paper contributes a first look into evaluation of predictions through their use, an integral component of predictive knowledge which is not as of yet explored. Expand
Predictions , Surprise , and Predictions of Surprise in General Value Function Architectures
Effective life-long deployment of an autonomous agent in a complex environment demands that the agent has some model of itself and its environment. Such models are inherently predictive, allowing anExpand
Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation
It is shown that using the Successor Representation can improve sample efficiency and learning speed of GVFs in a continual learning setting where new predictions are incrementally added and learned over time. Expand
Meta-learning for Predictive Knowledge Architectures: A Case Study Using TIDBD on a Sensor-rich Robotic Arm
Temporal-Difference Incremental Delta-Bar-Delta is explored -a meta-learning method for temporal-difference (TD) learning which adapts a vector of many step sizes, allowing for simultaneous step size tuning and representation learning. Expand
Incrementally Added GVFs are Learned Faster with the Successor Representation
We propose using the successor representation (SR) to accelerate learning in a constructive knowledge system based on general value functions (GVFs). We consider an agent which incrementallyExpand
Towards robust grasps: Using the environment semantics for robotic object affordances
The business problem that motivated the innovation, Kiva technology and the benefits it brought to customers, and the future of applications of robotics in warehouses are explained, as well as examples of the kinds of things that mobile robots can learn over long autonomous operations in such environments. Expand
Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures
TIDBD is shown to be a practical alternative for classic Temporal-Difference learning via an extensive parameter search, and the sensitivity of classic TD and TIDBD with respect to the initial step-size values on the robotic data set is investigated. Expand
Communicative Capital for Prosthetic Agents
The hypothesis that assistive devices, and specifically artificial arms and hands, can and should be viewed as agents is developed and a new schema for interpreting the capacity of a human-machine collaboration as a function of both the human's and machine's degrees of agency is proposed. Expand
Adaptive and Autonomous Switching: Shared Control of Powered Prosthetic Arms Using Reinforcement Learning
This work advances an RL method termed adaptive switching for use during real time control of a prosthetic arm, and combines it with another machine learning control method, termed autonomous switching, to further decrease the number of manual switching interactions required of a user. Expand
Machine learning and unlearning to autonomously switch between the functions of a myoelectric arm
The autonomous switching approach is described and it is demonstrated that it is able to both learn and subsequently unlearn to switch autonomously during ongoing use, a key requirement for maintaining human-centered shared control. Expand


This thesis explores a new approach to representing and acquiring predictive knowledge on a robot and uses recently developed gradient-TD methods that are compatible with off-policy learning and function approximation to explore the practicality of making and updating many predictions in parallel, while the agent interacts with the world from continuous inputs on a robots. Expand
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
Results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience are presented. Expand
Better Generalization with Forecasts
The results indicate that forecasts provide a substantial improvement in generalization, producing features that lead to better value-function approximation than PSRs and better generalization to as-yet-unseen parts of the state space. Expand
Multi-timescale nexting in a reinforcement learning robot
This paper presents results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds, and extends nexting beyond simple timescale by letting the discount rate be a function of the state. Expand
Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains
This work investigates how to compute robust confidences for value estimates in continuous Markov decision processes and demonstrates the applicability of the confidence estimation algorithms with experiments on exploration, parameter estimation and tracking. Expand
Curious model-building control systems
  • J. Schmidhuber
  • Computer Science
  • [Proceedings] 1991 IEEE International Joint Conference on Neural Networks
  • 1991
A novel curious model-building control system is described which actively tries to provoke situations for which it learned to expect to learn something about the environment, based on Watkins' Q-learning algorithm. Expand
Using Predictive Representations to Improve Generalization in Reinforcement Learning
It is shown in a reinforcement-learning example (a grid-world navigation task) that a predictive representation in tabular form can learn much faster than both the tabular explicit-state representation and a tabular history-based method. Expand
Predictive Representations of State
This is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls) and it is shown that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model. Expand
Ensemble Algorithms in Reinforcement Learning
  • M. Wiering, H. V. Hasselt
  • Computer Science, Medicine
  • IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
  • 2008
Several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms are described. Expand
A Collaborative Approach to the Simultaneous Multi-joint Control of a Prosthetic Arm
We have developed a real-time machine learning approach for the collaborative control of a prosthetic arm. Upper-limb amputees are often extremely limited in the number of inputs they can provide toExpand