Data-driven Rollout for Deterministic Optimal Control

  title={Data-driven Rollout for Deterministic Optimal Control},
  author={Yuchao Li and Karl Henrik Johansson and Jonas M{\aa}rtensson},
  journal={2021 60th IEEE Conference on Decision and Control (CDC)},
We consider deterministic infinite horizon optimal control problems with nonnegative stage costs. We draw inspiration from learning model predictive control scheme designed for continuous dynamics and iterative tasks, and propose a rollout algorithm that relies on sampled data generated by some base policy. The proposed algorithm is based on value and policy iteration ideas, and applies to deterministic problems with arbitrary state and control spaces, and arbitrary dynamics. It admits… 
2 Citations

Figures and Tables from this paper

Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control

This paper shows that the principal AlphaZero/TD-Gammon ideas of approximation in value space and rollout apply very broadly to deterministic and stochastic optimal control problems, involving both discrete and continuous search spaces.

Newton’s method for reinforcement learning and model predictive control

  • D. Bertsekas
  • Computer Science
    Results in Control and Optimization
  • 2022



Learning Model Predictive Control for Iterative Tasks. A Data-Driven Control Framework

The control design approach is presented, and it is shown how to recursively construct terminal set and terminal cost from state and input trajectories of previous iterations.

Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC

It is shown that the most common MPC schemes can be viewed as rollout algorithms and are related to policy iteration methods, and embedded within a new unifying suboptimal control framework, based on a concept of restricted or constrained structure policies, which contains these schemes as special cases.

Rollout Algorithms for Constrained Dynamic Programming

An extension of the rollout algorithm is derived that applies to constrained deterministic dynamic programming problems, and relies on a suboptimal policy, called base heuristic, which under suitable assumptions produces a feasible solution.

Learning Model Predictive Control for Iterative Tasks

The paper presents the control design approach, and shows how to recursively construct terminal set and terminal cost from state and input trajectories of previous iterations of the LMPC.

Dynamic Programming and Optimal Control

The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential

Optimal Infinite-Horizon Feedback Laws for a General Class of Constrained Discrete-Time Systems: Stability and Moving-Horizon Approximations

Stability results are given for a class of feedback systems arising from the regulation of time-varying discrete-time systems using optimal infinite-horizon and moving-horizon feedback laws. The

Negative Dynamic Programming

This paper deals with negative dynamic programming problems, i.e. discrete time total reward problems with non-positive reward functions, with countable state space, and shows that e-optimal stationary policies exist in general dynamic Programming problems if this is true for the imbedded negative model.

Multiagent Reinforcement Learning: Rollout and Policy Iteration

  • D. Bertsekas
  • Computer Science
    IEEE/CAA Journal of Automatica Sinica
  • 2021
This paper discusses autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information, which is sufficient to maintain the cost improvement property, without any on-line coordination of control selection between the agents.

A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands

The resulting rollout policy appears to be the first computationally tractable algorithm for approximately solving the problem under the reoptimization approach by sequentially improving a given a priori solution by means of a rollout algorithm.