Sujuan Wei

We don’t have enough information about this author to calculate their statistics. If you think this is an error let us know.
Learn More
We present a novel online learning control algorithm (OLCPA) which comprises projected gradient temporal difference for action-value function (PGTDAVF) and advanced heuristic dynamic programming with one step delay (AHD-POSD). PGTDAVF can guarantee the convergence of temporal difference(TD)-based policy learning with smooth action-value function(More)
  • 1