# Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

@inproceedings{Liang2018MemoryAP, title={Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing}, author={Chen Liang and Mohammad Norouzi and Jonathan Berant and Quoc V. Le and N. Lao}, booktitle={NeurIPS}, year={2018} }

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation… Expand

#### Supplemental Content

Github Repo

Via Papers with Code

Neural Symbolic Machines is a framework to integrate neural networks and symbolic representations using reinforcement learning, with applications in program synthesis and semantic parsing.

#### Paper Mentions

#### 78 Citations

Improving exploration in policy gradient search: Application to symbolic optimization

- Computer Science, Mathematics
- ArXiv
- 2021

This work presents two exploration methods, building upon ideas of entropy regularization and distribution initialization, that can improve the performance, increase sample efficiency, and lower the complexity of solutions for the task of symbolic regression. Expand

Learning Self-Imitating Diverse Policies

- Computer Science, Mathematics
- ICLR
- 2019

A self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings, and can be reduced into a policy-gradient algorithm with shaped rewards learned from experience replays with Stein variational policy gradient descent with Jensen-Shannon divergence. Expand

Learning to Generalize from Sparse and Underspecified Rewards

- Computer Science, Mathematics
- ICML
- 2019

This work proposes Meta Reward Learning (MeRL) to construct an auxiliary reward function that provides more refined feedback for learning, and outperforms the alternative reward learning technique based on Bayesian Optimization and achieves the state-of-the-art on weakly-supervised semantic parsing. Expand

Program Synthesis Using Deduction-Guided Reinforcement Learning

- Computer Science
- CAV
- 2020

A new variant of the policy gradient algorithm that can incorporate feedback from a deduction engine into the underlying statistical model is proposed that combines the power of deductive and statistical reasoning in a unified framework. Expand

Less is More: Data-Efficient Complex Question Answering over Knowledge Bases

- Computer Science
- J. Web Semant.
- 2020

The Neural-Symbolic Complex Question Answering (NS-CQA) model is proposed, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples and outperforms the state-of-the-art models on two datasets. Expand

Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning

- Computer Science
- AAAI
- 2020

The goal is to learn a neural semantic parser when only prior knowledge about a limited number of simple rules is available, without access to either annotated programs or execution results, and the approach is initialized by rules, and improved in a back-translation paradigm. Expand

Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs

- Computer Science
- EMNLP
- 2019

This work capitalize on the intuition that correct programs would likely respect certain structural constraints were they to be aligned to the question and propose to model alignments as structured latent variables as part of the latent-alignment framework. Expand

Unsupervised Learning of KB Queries in Task-Oriented Dialogs

- Computer Science, Mathematics
- Transactions of the Association for Computational Linguistics
- 2021

This work defines the novel problems of predicting the KB query and training the dialog agent, without explicit KB query annotation, and proposes a reinforcement learning (RL) baseline, which rewards the generation of those queries whose KB results cover the entities mentioned in subsequent dialog. Expand

Correct Program : Instantiation Execution Denotation : 0 Spurious Programs : Inconsistent Program

- 2019

Semantic parsing aims to map natural language utterances onto machine interpretable meaning representations, aka programs whose execution against a real-world environment produces a denotation.… Expand

Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning

- Computer Science
- ArXiv
- 2021

WNSMN is proposed, a Weakly-Supervised Neuro-Symbolic Module Network trained with answers as the sole supervision for numerical reasoning based MRC that outperforms NMN by 32% and the reasoning-free language model GenBERT by 8% in exact match accuracy when trained under comparable weak supervised settings. Expand

#### References

SHOWING 1-10 OF 71 REFERENCES

Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis

- Computer Science, Mathematics
- ICLR
- 2018

Reinforcement learning is performed on top of a supervised model with an objective that explicitly maximizes the likelihood of generating semantically correct programs, which leads to improved accuracy of the models, especially in cases where the training data is limited. Expand

Neural Program Synthesis with Priority Queue Training

- Computer Science
- ArXiv
- 2018

By adding a program length penalty to the reward function, this work is able to synthesize short, human readable programs in a simple but expressive Turing complete programming language called BF. Expand

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

- Computer Science, Mathematics
- ICLR
- 2018

The experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and high-dimensional hand manipulation and synthetic tasks, and the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks. Expand

Bridging the Gap Between Value and Policy Based Reinforcement Learning

- Computer Science, Mathematics
- NIPS
- 2017

A new RL algorithm, Path Consistency Learning (PCL), is developed that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces and significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks. Expand

Reward Augmented Maximum Likelihood for Neural Structured Prediction

- Computer Science, Mathematics
- NIPS
- 2016

This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Expand

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

- Computer Science, Mathematics
- ACL
- 2017

The goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available, and a new algorithm is presented that guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL. Expand

VIME: Variational Information Maximizing Exploration

- Computer Science, Mathematics
- NIPS
- 2016

VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms. Expand

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

- Computer Science, Mathematics
- ICML
- 2018

A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand

Proximal Policy Optimization Algorithms

- Computer Science
- ArXiv
- 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective… Expand

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

- Computer Science, Mathematics
- NIPS
- 2017

A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks. Expand