• Corpus ID: 218684707

Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text

@article{Hill2020HumanIW,
  title={Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text},
  author={Felix Hill and Sona Mokra and Nathaniel Wong and Tim Harley},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.09382}
}
Recent work has described neural-network-based agents that are trained with reinforcement learning (RL) to execute language-like commands in simulated worlds, as a step towards an intelligent agent or robot that can be instructed by human users. However, the optimisation of multi-goal motor policies via deep RL from scratch requires many episodes of experience. Consequently, instruction-following with deep RL typically involves language generated from templates (by an environment simulator… 
Intra-agent speech permits zero-shot task acquisition
TLDR
Modelling intra-agent speech is suggested to be effective in enabling embodied agents to learn new tasks efficiently and without direct interaction experience.
HIGhER: Improving instruction following with Hindsight Generation for Experience Replay
TLDR
This paper proposes an orthogonal approach called Hindsight Generation for Experience Replay (HIGhER) that extends the Hindsight Experience Replay approach to the language-conditioned policy setting, and shows the efficiency of the approach in the BabyAI environment, and how it complements other instruction following methods.
Pre-Trained Language Models for Interactive Decision-Making
TLDR
An approach for using LMs to scaffold learning and generalization in general sequential decision-making problems, in which goals and observations are represented as a sequence of embeddings, and a policy network initialized with a pre-trained LM predicts the next action.
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
TLDR
This paper investigates the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps and proposes a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
ELLA: Exploration through Learned Language Abstraction
TLDR
On a suite of complex grid world environments with varying instruction complexities and reward sparsity, ELLA shows a significant gain in sample efficiency across several environments compared to competitive language-based reward shaping and no-shaping methods.
LanCon-Learn: Learning With Language to Enable Generalization in Multi-Task Manipulation
TLDR
LanCon-Learn is presented, a novel attention-based approach to language-conditioned multi-task learning in manipulation domains to enable learning agents to reason about relationships between skills and task objectives through natural language and interaction, demonstrating the utility of language for goal specification.
Learning to Query Internet Text for Informing Reinforcement Learning Agents
TLDR
This work proposes to address the problem of extracting useful information from natural language found in the wild by training reinforcement learning agents to learn to query these sources as a human would, and demonstrates that pretrained QA models perform well at executing zero-shot queries in the target domain.
Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
TLDR
A typology of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills is proposed at the intersection of deep RL and developmental approaches.
Reinforcement Learning of Implicit and Explicit Control Flow in Instructions
TLDR
An attention-based architecture is formulated that meets the problem of learning control flow that deviates from a strict step-by-step execution of instructions by learning to flexibly attend to and condition behavior on an internal encoding of the instructions.
LILA: Language-Informed Latent Actions
TLDR
Language-Informed Latent Actions models are shown to be not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.
...
...

References

SHOWING 1-10 OF 45 REFERENCES
Learning to Understand Goal Specifications by Modelling Reward
TLDR
A framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples, which allows an agent to adapt to changes in the environment without requiring new expert examples.
Speaker-Follower Models for Vision-and-Language Navigation
TLDR
Experiments show that all three components of this approach---speaker-driven data augmentation, pragmatic reasoning and panoramic action space---dramatically improve the performance of a baseline instruction follower, more than doubling the success rate over the best existing approach on a standard benchmark.
Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents
TLDR
This paper proposes a simple but effective neural language grounding module for embodied agents that can be trained end to end from scratch taking raw pixels, unstructured linguistic commands, and sparse rewards as the inputs.
Gated-Attention Architectures for Task-Oriented Language Grounding
TLDR
An end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prior linguistic or perceptual knowledge and requires only raw pixels from the environment and the natural language instruction as input.
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
TLDR
A new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks is introduced and a new neural architecture in the meta controller that learns when to update the subtask is proposed, which makes learning more efficient.
ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning
TLDR
This work presents Augmenting experienCe via TeacheR's adviCE (ACTRCE), an efficient reinforcement learning technique that extends the HER framework using natural language as the goal representation, and shows that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks.
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
TLDR
This work learns a single model to jointly reason about linguistic and visual input in a contextual bandit setting to train a neural network agent and shows significant improvements over supervised learning and common reinforcement learning variants.
RTFM: Generalising to Novel Environment Dynamics via Reading
TLDR
This work proposes a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations, and procedurally generate environment dynamics and corresponding language descriptions of the dynamics.
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
TLDR
The BabyAI research platform is introduced to support investigations towards including humans in the loop for grounded language learning and puts forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
TLDR
This paper introduces an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine and finds that, using the approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations.
...
...