Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

  title={Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing},
  author={Chen Liang and Mohammad Norouzi and Jonathan Berant and Quoc V. Le and Ni Lao},
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimates. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization. Our key idea is to express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside a memory buffer, and a separate… CONTINUE READING

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • On the WIKITABLEQUESTIONS benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WIKISQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision.


Publications referenced by this paper.

Similar Papers

Loading similar papers…