• Corpus ID: 222090931

Robust Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation Maximization

@article{Mallick2020RobustSO,
  title={Robust Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation Maximization},
  author={Prakash Mallick and Zhiyong Chen},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.00207}
}
Trajectory optimization is a fundamental stochastic optimal control problem. This paper deals with a trajectory optimization approach for unknown complicated systems subjected to stochastic sensor noise. The proposed methodology assimilates the benefits of conventional optimal control procedure with the advantages of maximum likelihood approaches to deliver a novel iterative trajectory optimization paradigm to be called as Stochastic Optimal Control - Expectation Maximization (SOC-EM). This… 
1 Citations

Figures and Tables from this paper

References

SHOWING 1-10 OF 56 REFERENCES
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Guided Policy Search via Approximate Mirror Descent
TLDR
A new guided policy search algorithm is derived that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and it is shown that in the more general nonlinear setting, the error in the projection step can be bounded.
Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search
TLDR
This work proposes to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment, and a deep neural network policy is trained, which can successfully control the robot without knowledge of the full state.
Guided Policy Search
TLDR
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference
We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also
Linearly-solvable Markov decision problems
TLDR
A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.
Linear Estimation
Linear Models and Time-Series Analysis
  • Marc S. Paolella
  • Mathematics, Computer Science
    Wiley Series in Probability and Statistics
  • 2018
TLDR
Linear Models and Time-Series Analysis: Regression, ANOVA, ARMA and GARCH sets a strong foundation for the linear model, univariate time series analysis, and some multivariate models associated primarily with modeling financial asset returns.
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
TLDR
This work enables a model-based algorithm based on the linear-quadratic regulator that can be integrated into the model-free framework of path integral policy improvement and can further combine with guided policy search to train arbitrary parameterized policies such as deep neural networks.
Reward Augmented Maximum Likelihood for Neural Structured Prediction
TLDR
This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards.
...
1
2
3
4
5
...