TAMER: Training an Agent Manually via Evaluative Reinforcement

@article{Knox2008TAMERTA,
  title={TAMER: Training an Agent Manually via Evaluative Reinforcement},
  author={W. B. Knox and P. Stone},
  journal={2008 7th IEEE International Conference on Development and Learning},
  year={2008},
  pages={292-297}
}
  • W. B. Knox, P. Stone
  • Published 10 October 2008
  • Computer Science
  • 2008 7th IEEE International Conference on Development and Learning
Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative… 

Figures and Tables from this paper

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

TLDR
The fast learning exhibited within the tamer framework is leveraged to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent.

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

TLDR
This work demonstrates a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze and proposes an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards.

Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework

TLDR
This work treats shaping as a specific mode of knowledge transfer, distinct from (and probably complementary to) other natural methods of communication, including programming by demonstration and advice-giving, which is to create agents that can be shaped effectively.

Deep Reinforcement Learning from Policy-Dependent Human Feedback

TLDR
The effectiveness of the Deep COACH algorithm is demonstrated in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.

Learning from feedback on actions past and intended

  • W. Stone
  • Psychology, Computer Science
  • 2012
Robotic learning promises to eventually provide great societal benefits. In contrast to pure trial-and-error learning, human instruction has at least two benefits: (1) Human teaching can lead to much

Mutual Reinforcement Learning

TLDR
A shared cognitive model is obtained which not only improves human cognition but enhances the robot's cognitive strategy to understand the mental model of its human partners while building a successful robot-human collaborative framework.

Design Principles for Creating Human-Shapable Agents

TLDR
A framework that allows a human to train a learning agent by giving simple scalar reinforcement 1 signals while observing the agent perform the task is described and a set of conjectures about aspects of human teaching behavior are proposed that could be incorporated into future work on HT agents.

Learning via human feedback in continuous state and action spaces

TLDR
An extension of TAMER to allow both continuous states and actions, called ACTAMER, is proposed, which utilizes any general function approximation of a human trainer’s feedback signal.

Reinforcement learning combined with human feedback in continuous state and action spaces

  • Ngo Anh VienW. Ertel
  • Computer Science
    2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)
  • 2012
TLDR
The new framework extends the original TAMER to allow using any general function approximation of a human trainer's reinforcement signal, and investigates a combination capability of the ACTAMER and reinforcement learning (RL).

Teachable Reinforcement Learning via Advice Distillation

TLDR
In puzzle-solving, navigation, and locomotion domains, it is shown that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms and often less than imitation learning.
...

References

SHOWING 1-10 OF 20 REFERENCES

Apprenticeship learning via inverse reinforcement learning

TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance

TLDR
The importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance is demonstrated.

Creating Advice-Taking Reinforcement Learners

TLDR
This work presents and evaluates a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer, and shows that, given good advice, a learner can achieve statistically significant gains in expected reward.

Reinforcement Learning: An Introduction

TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Integrated learning for interactive synthetic characters

TLDR
An autonomous animated dog is built that can be trained with a technique used to train real dogs called "clicker training" and capabilities demonstrated include being trained to recognize and use acoustic patterns as cues for actions, as well as to synthesize new actions from novel paths through its motion space.

Cobot in LambdaMOO: An Adaptive Social Statistics Agent

TLDR
Cobot, a novel software agent who lives in LambdaMOO, a popular virtual world frequented by hundreds of users, uses reinforcement learning to proactively take action in this complex social environment, and adapts his behavior based on multiple sources of human reward.

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TLDR
The latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.

A simulation-theory inspired social learning system for interactive characters

TLDR
Max T. Mouse is presented, an anthropomorphic animated mouse character who uses his own motor and action representations to interpret the behaviors he sees his friend Morris Mouse performing (a process known as simulation theory in the cognitive literature).

Reinforcement Learning for RoboCup Soccer Keepaway

TLDR
The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies.

Using Prior Knowledge to Improve Reinforcement Learning in Mobile Robotics

TLDR
A new strategy is proposed, called Supervised Reinforcement Learning (SRL), for taking advantage of external knowledge within this type of learning and validate it in a wall-following behaviour.