Robust Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation Maximization
@article{Mallick2020RobustSO, title={Robust Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation Maximization}, author={Prakash Mallick and Zhiyong Chen}, journal={ArXiv}, year={2020}, volume={abs/2010.00207} }
Trajectory optimization is a fundamental stochastic optimal control problem. This paper deals with a trajectory optimization approach for unknown complicated systems subjected to stochastic sensor noise. The proposed methodology assimilates the benefits of conventional optimal control procedure with the advantages of maximum likelihood approaches to deliver a novel iterative trajectory optimization paradigm to be called as Stochastic Optimal Control - Expectation Maximization (SOC-EM). This…
Figures and Tables from this paper
One Citation
Reinforcement Learning Using Expectation Maximization Based Guided Policy Search for Stochastic Dynamics
- Computer ScienceNeurocomputing
- 2022
References
SHOWING 1-10 OF 56 REFERENCES
End-to-End Training of Deep Visuomotor Policies
- Computer ScienceJ. Mach. Learn. Res.
- 2016
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Guided Policy Search via Approximate Mirror Descent
- Computer ScienceNIPS
- 2016
A new guided policy search algorithm is derived that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and it is shown that in the more general nonlinear setting, the error in the projection step can be bounded.
Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search
- Computer Science2016 IEEE International Conference on Robotics and Automation (ICRA)
- 2016
This work proposes to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment, and a deep neural network policy is trained, which can successfully control the robot without knowledge of the full state.
Guided Policy Search
- Computer ScienceICML
- 2013
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference
- Computer ScienceRobotics: Science and Systems
- 2012
We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also…
Linearly-solvable Markov decision problems
- Computer ScienceNIPS
- 2006
A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.
Linear Models and Time-Series Analysis
- Mathematics, Computer ScienceWiley Series in Probability and Statistics
- 2018
Linear Models and Time-Series Analysis: Regression, ANOVA, ARMA and GARCH sets a strong foundation for the linear model, univariate time series analysis, and some multivariate models associated primarily with modeling financial asset returns.
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
- Computer ScienceICML
- 2017
This work enables a model-based algorithm based on the linear-quadratic regulator that can be integrated into the model-free framework of path integral policy improvement and can further combine with guided policy search to train arbitrary parameterized policies such as deep neural networks.
Reward Augmented Maximum Likelihood for Neural Structured Prediction
- Computer ScienceNIPS
- 2016
This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards.