Stochastic policy gradient reinforcement learning on a simple 3D biped

Abstract

We present a learning system which is able to quickly and reliably acquire a robust feedback control policy for 3D dynamic walking from a blank-slate using only trials implemented on our physical robot. The robot begins walking within a minute and learning converges in approximately 20 minutes. This success can be attributed to the mechanics of our robot… (More)
DOI: 10.1109/IROS.2004.1389841

Topics

5 Figures and Tables

Statistics

0102030'05'07'09'11'13'15'17
Citations per Year

278 Citations

Semantic Scholar estimates that this publication has 278 citations based on the available data.

See our FAQ for additional information.