The Value of Planning for Infinite-Horizon Model Predictive Control

@article{Hatch2021TheVO,
  title={The Value of Planning for Infinite-Horizon Model Predictive Control},
  author={Nathan Hatch and Byron Boots},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021},
  pages={7372-7378}
}
  • Nathan Hatch, Byron Boots
  • Published 7 April 2021
  • Computer Science
  • 2021 IEEE International Conference on Robotics and Automation (ICRA)
Model Predictive Control (MPC) is a classic tool for optimal control of complex, real-world systems. Although it has been successfully applied to a wide range of challenging tasks in robotics, it is fundamentally limited by the prediction horizon, which, if too short, will result in myopic decisions. Recently, several papers have suggested using a learned value function as the terminal cost for MPC. If the value function is accurate, it effectively allows MPC to reason over an infinite horizon… 

Figures and Tables from this paper

Temporal Difference Learning for Model Predictive Control

TLDR
This work combines the strengths of model-free and model-based methods, using a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning.

References

SHOWING 1-10 OF 29 REFERENCES

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

TLDR
This work presents a framework for improving on MPC with model-free reinforcement learning (RL), and shows how error from inaccurate models in MPC and value function estimation in RL can be balanced.

Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts

TLDR
This paper uses offline optimization to find the limit-cycle solution of an infinite-horizon average-cost optimal-control task, and compute a local quadratic approximation of the Value function around this limit cycle that is used as the terminal cost of an online MPC.

Sampling-based algorithms for optimal motion planning using closed-loop prediction

TLDR
This work describes CL-RRT#, which leverages ideas from the RRT# algorithm and a variant of the R RT algorithm, which generates trajectories using closed-loop prediction and shows the benefits of the proposed approach on an autonomous-driving scenario.

An integrated system for real-time model predictive control of humanoid robots

TLDR
An integrated system based on real-time model-predictive control (MPC) applied to the full dynamics of the robot, which is possible due to the speed of the new physics engine (MuJoCo), the efficiency of the trajectory optimization algorithm, and the contact smoothing methods developed for the purpose of control optimization.

Information Theoretic Model Predictive Q-Learning

TLDR
This work presents a novel theoretical connection between information theoretic MPC and entropy regularized RL and develops a Q-learning algorithm that can leverage biased models and validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch.

Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control

TLDR
A plan online and learn offline (POLO) framework for the setting where an agent, with an internal model, needs to continually act and learn in the world and how trajectory optimization can be used to perform temporally coordinated exploration in conjunction with estimating uncertainty in value function approximation.

An Online Learning Approach to Model Predictive Control

TLDR
This paper proposes a new algorithm based on dynamic mirror descent (DMD), an online learning algorithm that is designed for non-stationary setups and provides a fresh perspective on previous heuristics used in MPC and suggests a principled way to design new MPC algorithms.

Information theoretic MPC for model-based reinforcement learning

TLDR
An information theoretic model predictive control algorithm capable of handling complex cost criteria and general nonlinear dynamics and using multi-layer neural networks as dynamics models to solve model-based reinforcement learning tasks is introduced.

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

TLDR
This paper proposes Truncated HORizon Policy Search (THOR), a method that focuses on searching for policies that maximize the total reshaped reward over a finite planning horizon when the oracle is sub-optimal and experimentally demonstrates that a gradient-based implementation of THOR can achieve superior performance compared to RL baselines and IL baselines.

Optimal kinodynamic motion planning using incremental sampling-based methods

TLDR
It is shown that the RRT* algorithm equipped with any local steering procedure that satisfies this condition converges to an optimal solution almost surely, while maintaining the same properties of the standard RRT algorithm.