High-Dimensional Continuous Control Using Generalized Advantage Estimation

  title={High-Dimensional Continuous Control Using Generalized Advantage Estimation},
  author={John Schulman and Philipp Moritz and Sergey Levine and Michael I. Jordan and Pieter Abbeel},
Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the… CONTINUE READING
Highly Influential
This paper has highly influenced a number of papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 320 citations. REVIEW CITATIONS

4 Figures & Tables



Citations per Year

320 Citations

Semantic Scholar estimates that this publication has 320 citations based on the available data.

See our FAQ for additional information.