Corpus ID: 52300972

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

@inproceedings{Liang2018MemoryAP,
  title={Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing},
  author={Chen Liang and Mohammad Norouzi and Jonathan Berant and Quoc V. Le and N. Lao},
  booktitle={NeurIPS},
  year={2018}
}
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation… Expand
Improving exploration in policy gradient search: Application to symbolic optimization
TLDR
This work presents two exploration methods, building upon ideas of entropy regularization and distribution initialization, that can improve the performance, increase sample efficiency, and lower the complexity of solutions for the task of symbolic regression. Expand
Learning Self-Imitating Diverse Policies
TLDR
A self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings, and can be reduced into a policy-gradient algorithm with shaped rewards learned from experience replays with Stein variational policy gradient descent with Jensen-Shannon divergence. Expand
Learning to Generalize from Sparse and Underspecified Rewards
TLDR
This work proposes Meta Reward Learning (MeRL) to construct an auxiliary reward function that provides more refined feedback for learning, and outperforms the alternative reward learning technique based on Bayesian Optimization and achieves the state-of-the-art on weakly-supervised semantic parsing. Expand
Program Synthesis Using Deduction-Guided Reinforcement Learning
TLDR
A new variant of the policy gradient algorithm that can incorporate feedback from a deduction engine into the underlying statistical model is proposed that combines the power of deductive and statistical reasoning in a unified framework. Expand
Less is More: Data-Efficient Complex Question Answering over Knowledge Bases
TLDR
The Neural-Symbolic Complex Question Answering (NS-CQA) model is proposed, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples and outperforms the state-of-the-art models on two datasets. Expand
Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning
TLDR
The goal is to learn a neural semantic parser when only prior knowledge about a limited number of simple rules is available, without access to either annotated programs or execution results, and the approach is initialized by rules, and improved in a back-translation paradigm. Expand
Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs
TLDR
This work capitalize on the intuition that correct programs would likely respect certain structural constraints were they to be aligned to the question and propose to model alignments as structured latent variables as part of the latent-alignment framework. Expand
Unsupervised Learning of KB Queries in Task-Oriented Dialogs
TLDR
This work defines the novel problems of predicting the KB query and training the dialog agent, without explicit KB query annotation, and proposes a reinforcement learning (RL) baseline, which rewards the generation of those queries whose KB results cover the entities mentioned in subsequent dialog. Expand
Correct Program : Instantiation Execution Denotation : 0 Spurious Programs : Inconsistent Program
Semantic parsing aims to map natural language utterances onto machine interpretable meaning representations, aka programs whose execution against a real-world environment produces a denotation.Expand
Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning
TLDR
WNSMN is proposed, a Weakly-Supervised Neuro-Symbolic Module Network trained with answers as the sole supervision for numerical reasoning based MRC that outperforms NMN by 32% and the reasoning-free language model GenBERT by 8% in exact match accuracy when trained under comparable weak supervised settings. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 71 REFERENCES
Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
TLDR
Reinforcement learning is performed on top of a supervised model with an objective that explicitly maximizes the likelihood of generating semantically correct programs, which leads to improved accuracy of the models, especially in cases where the training data is limited. Expand
Neural Program Synthesis with Priority Queue Training
TLDR
By adding a program length penalty to the reward function, this work is able to synthesize short, human readable programs in a simple but expressive Turing complete programming language called BF. Expand
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
TLDR
The experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and high-dimensional hand manipulation and synthetic tasks, and the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks. Expand
Bridging the Gap Between Value and Policy Based Reinforcement Learning
TLDR
A new RL algorithm, Path Consistency Learning (PCL), is developed that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces and significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks. Expand
Reward Augmented Maximum Likelihood for Neural Structured Prediction
TLDR
This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Expand
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood
TLDR
The goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available, and a new algorithm is presented that guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL. Expand
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms. Expand
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
TLDR
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks. Expand
...
1
2
3
4
5
...