Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

@inproceedings{Liang2018MemoryAP,
  title={Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing},
  author={Chen Liang and Mohammad Norouzi and Jonathan Berant and Quoc Le and Ni Lao},
  year={2018}
}
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 5 times over the past 90 days. VIEW TWEETS

Similar Papers

Loading similar papers…