TAMER: Training an Agent Manually via Evaluative Reinforcement

  title={TAMER: Training an Agent Manually via Evaluative Reinforcement},
  author={W. B. Knox and P. Stone},
  journal={2008 7th IEEE International Conference on Development and Learning},
  • W. B. KnoxP. Stone
  • Published 10 October 2008
  • Computer Science
  • 2008 7th IEEE International Conference on Development and Learning
Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative… 

Figures and Tables from this paper

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

The fast learning exhibited within the tamer framework is leveraged to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent.

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

This work demonstrates a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze and proposes an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards.

Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework

This work treats shaping as a specific mode of knowledge transfer, distinct from (and probably complementary to) other natural methods of communication, including programming by demonstration and advice-giving, which is to create agents that can be shaped effectively.

Few-Shot Preference Learning for Human-in-the-Loop RL

This work pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries, reducing the amount of online feedback needed to train manipulation policies in MetaWorld by 20 ×, and demonstrating the effectiveness of this method on a real Franka Panda Robot.

Deep Reinforcement Learning from Policy-Dependent Human Feedback

The effectiveness of the Deep COACH algorithm is demonstrated in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.

Learning from feedback on actions past and intended

  • W. Stone
  • Psychology, Computer Science
  • 2012
Robotic learning promises to eventually provide great societal benefits. In contrast to pure trial-and-error learning, human instruction has at least two benefits: (1) Human teaching can lead to much

Mutual Reinforcement Learning

A shared cognitive model is obtained which not only improves human cognition but enhances the robot's cognitive strategy to understand the mental model of its human partners while building a successful robot-human collaborative framework.

Design Principles for Creating Human-Shapable Agents

A framework that allows a human to train a learning agent by giving simple scalar reinforcement 1 signals while observing the agent perform the task is described and a set of conjectures about aspects of human teaching behavior are proposed that could be incorporated into future work on HT agents.

Learning via human feedback in continuous state and action spaces

An extension of TAMER to allow both continuous states and actions, called ACTAMER, is proposed, which utilizes any general function approximation of a human trainer’s feedback signal.

Multi-trainer Interactive Reinforcement Learning System

This paper proposes a more interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment.



Apprenticeship learning via inverse reinforcement learning

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance

The importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance is demonstrated.

Creating Advice-Taking Reinforcement Learners

This work presents and evaluates a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer, and shows that, given good advice, a learner can achieve statistically significant gains in expected reward.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Integrated learning for interactive synthetic characters

An autonomous animated dog is built that can be trained with a technique used to train real dogs called "clicker training" and capabilities demonstrated include being trained to recognize and use acoustic patterns as cues for actions, as well as to synthesize new actions from novel paths through its motion space.

Cobot in LambdaMOO: An Adaptive Social Statistics Agent

Cobot, a novel software agent who lives in LambdaMOO, a popular virtual world frequented by hundreds of users, uses reinforcement learning to proactively take action in this complex social environment, and adapts his behavior based on multiple sources of human reward.

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

The latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.

A simulation-theory inspired social learning system for interactive characters

Max T. Mouse is presented, an anthropomorphic animated mouse character who uses his own motor and action representations to interpret the behaviors he sees his friend Morris Mouse performing (a process known as simulation theory in the cognitive literature).

Reinforcement Learning for RoboCup Soccer Keepaway

The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies.

Using Prior Knowledge to Improve Reinforcement Learning in Mobile Robotics

A new strategy is proposed, called Supervised Reinforcement Learning (SRL), for taking advantage of external knowledge within this type of learning and validate it in a wall-following behaviour.