Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

@inproceedings{Liang2018MemoryAP,
  title={Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing},
  author={Chen Liang and Mohammad Norouzi and Jonathan Berant and Quoc V. Le and Ni Lao},
  booktitle={NeurIPS},
  year={2018}
}
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimates. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization. Our key idea is to express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside a memory buffer, and a separate… CONTINUE READING

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • On the WIKITABLEQUESTIONS benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WIKISQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision.

References

Publications referenced by this paper.
SHOWING 1-10 OF 69 REFERENCES

Similar Papers

Loading similar papers…