Reinforcement Learning Using Expectation Maximization Based Guided Policy Search for Stochastic Dynamics

  title={Reinforcement Learning Using Expectation Maximization Based Guided Policy Search for Stochastic Dynamics},
  author={Prakash Mallick and Zhiyong Chen and Mohsen Zamani},

Figures from this paper


An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
A new expectation maximization algorithm is derived for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians, which is more flexible and general than closed-form solutions.
Guided Policy Search
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
A Survey on Policy Search for Robotics
This work classifies model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and presents a unified view on existing algorithms.
Trust Region Policy Optimization
A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).
Path integral guided policy search
This work presents a policy search method for learning complex feedback control policies that map from high-dimensional sensory inputs to motor torques, for manipulation tasks with discontinuous contact dynamics, and demonstrates that this approach substantially outperforms the prior LQR-based local policy optimizer on these tasks.
Probabilistic inference for solving discrete and continuous state Markov Decision Processes
An Expectation Maximization algorithm for computing optimal policies that actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time is presented.
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.
Information theoretic MPC for model-based reinforcement learning
An information theoretic model predictive control algorithm capable of handling complex cost criteria and general nonlinear dynamics and using multi-layer neural networks as dynamics models to solve model-based reinforcement learning tasks is introduced.
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
This work enables a model-based algorithm based on the linear-quadratic regulator that can be integrated into the model-free framework of path integral policy improvement and can further combine with guided policy search to train arbitrary parameterized policies such as deep neural networks.
Policy Gradient Methods for Robotics
  • Jan Peters, S. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.