Reinforcement Learning for Mapping Instructions to Actions

@inproceedings{Branavan2009ReinforcementLF,
  title={Reinforcement Learning for Mapping Instructions to Actions},
  author={S. R. K. Branavan and Harr Chen and Luke Zettlemoyer and Regina Barzilay},
  booktitle={ACL},
  year={2009}
}
In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function that defines the quality of the executed actions. During training, the learner repeatedly constructs action sequences for a set of documents, executes those actions, and observes the resulting reward. We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. We apply our… 

Figures and Tables from this paper

Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base
TLDR
A reinforcement learning approach with a knowledge base for mapping natural language instructions to executable action sequences and a reward function with immediate rewards and delayed rewards is designed to handle sparse reward problems.
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
TLDR
This work learns a single model to jointly reason about linguistic and visual input in a contextual bandit setting to train a neural network agent and shows significant improvements over supervised learning and common reinforcement learning variants.
Guiding Reinforcement Learning Exploration Using Natural Language
TLDR
A technique to use natural language to help reinforcement learning generalize to unseen environments using neural machine translation, specifically the use of encoder-decoder networks, to learn associations between natural language behavior descriptions and state-action information is presented.
Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents
TLDR
A novel policy optimization algorithm which can dynamically schedule demonstration learning and RL, and the best single model based on the proposed method tremendously decreases the execution error on a block-world environment.
Guiding Policies with Language via Meta-Learning
TLDR
This work proposes an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill, and shows that this method can enable a policy to follow instructions and corrections for simulated navigation and manipulation tasks, substantially outperforming direct, non-interactive instruction following.
Reading Between the Lines : Learning to Map High-level Instructions to
TLDR
A method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen that enables learning for mapping high-level instructions, which previous statistical methods cannot handle.
Skill Induction and Planning with Latent Language
We present a framework for learning hierarchical policies from demonstrations, using sparse natural language annotations to guide the discovery of reusable skills for autonomous decision-making. We
Deep Reinforcement Learning with a Natural Language Action Space
This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based
Reading between the Lines: Learning to Map High-Level Instructions to Commands
TLDR
A method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen that enables learning for mapping high-level instructions, which previous statistical methods cannot handle.
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
TLDR
This work proposes language-conditioned reward learning (LC-RL), which grounds language commands as a reward function represented by a deep neural network, and demonstrates that the model learns rewards that transfer to novel tasks and environments on realistic, high-dimensional visual environments with natural language commands.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning
TLDR
Q-learning with eligibility traces was applied to obtain policies for a telephone-based cinema information system, and the policies outperformed handcrafted policies that operated in the same restricted state space, and gave performance similar to the original design that had been through several iterations of manual refinement.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Reinforcement Learning for Spoken Dialogue Systems
TLDR
A general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework is built and applied to dialogue corpora gathered from two dialogue systems built at AT&T Labs, demonstrating that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue Corpora.
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Intentional Context in Situated Natural Language Learning
TLDR
A model of situated language acquisition that operates in two phases, where intentional context is represented and inferred from user actions using probabilistic context free grammars and utterances are mapped onto this representation in a noisy channel framework.
Learning Language from Its Perceptual Context
TLDR
A system that learns to sportscast simulated robot soccer games by example using textual human commentaries on Robocup simulation games and has been evaluated based on its ability to properly match sentences to the events being described, parse sentences into correct meanings, and generate accurate linguistic descriptions of events.
Learning to sportscast: a test of grounded language acquisition
TLDR
A novel commentator system that learns language from sportscasts of simulated soccer games and uses a novel algorithm, Iterative Generation Strategy Learning (IGSL), for deciding which events to comment on.
Spoken Dialogue Management Using Probabilistic Reasoning
TLDR
This work uses a Partially Observable Markov Decision Process (POMDP)-style approach to generate dialogue strategies by inverting the notion of dialogue state; the state represents the user's intentions, rather than the system state.
Introduction to Reinforcement Learning
TLDR
In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Automatic Optimization of Dialogue Management
TLDR
This paper presents a reinforcement learning approach for automatically optimizing a dialogue strategy that addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users and shows that this approach measurably improves performance in an experimental system.
...
1
2
3
...