• Corpus ID: 233715120

Regret-Optimal Full-Information Control

@article{Sabag2021RegretOptimalFC,
  title={Regret-Optimal Full-Information Control},
  author={Oron Sabag and Gautam Goel and Sahin Lale and Babak Hassibi},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.01244}
}
We consider the infinite-horizon, discrete-time fullinformation control problem. Motivated by learning theory, as a criterion for controller design we focus on regret, defined as the difference between the LQR cost of a causal controller (that has only access to past and current disturbances) and the LQR cost of a clairvoyant one (that has also access to future disturbances). In the full-information setting, there is a unique optimal non-causal controller that in terms of LQR cost dominates all… 

Figures and Tables from this paper

Regret-optimal Estimation and Control

TLDR
This work shows that the regret-optimal estimators and controllers can be derived in state-space form using operator-theoretic techniques from robust control and presents tight, data-dependent bounds on the regret incurred by the algorithms in terms of the energy of the disturbances.

Regret-Optimal Filtering

TLDR
The regret-optimal estimator is the causal estimator that minimizes the worst-case regret across all bounded-energy noise sequences and is represented as a finite-dimensional state-space whose parameters can be computed by solving three Riccati equations and a single Lyapunov equation.

Safe Control with Minimal Regret

TLDR
This paper presents an efficient optimization-based approach for computing a finite-horizon robustly safe control policy that minimizes dynamic regret, in the sense of the loss relative to the optimal sequence of control actions selected in hindsight by a clairvoyant controller.

Optimal Competitive-Ratio Control

TLDR
An interesting relation is revealed between the explicit solutions that now exist for both competitive control paradigms by formulating a regret-optimal control framework with weight functions that can also be utilized for practical purposes.

A System Level Approach to Regret Optimal Control

TLDR
An optimisation-based method for synthesising a dynamic regret optimal controller for linear systems with potentially adversarial disturbances and known or adversarial initial conditions is presented and the proposed framework allows guaranteeing state and input constraint satisfaction.

Thompson Sampling Achieves Õ(√T) Regret in Linear Quadratic Control

TLDR
It is shown that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs, thereby solving the open problem posed in Abeille and Lazaric (2018) and developing a novel lower bound on the probability that the TS provides an optimistic sample.

Online estimation and control with optimal pathlength regret

TLDR
The key idea in the derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control in robust dynamical systems; these reductions may be of independent interest.

Thompson Sampling Achieves $\tilde{O}(\sqrt{T})$ Regret in Linear Quadratic Control

TLDR
It is shown that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs by carefully prescribing an early exploration strategy and a policy update rule, thereby solving the open problem posed in Abeille and Lazaric (2018.

Supplementary Materials for “ Regret-Optimal Filtering ”

  • Mathematics
  • 2021
The main objective of the Supplementary Materials file is to provide detailed derivation and proofs for the main results in the paper. It has three main sections. In the first section, we present the

References

SHOWING 1-10 OF 23 REFERENCES

Regret-Optimal Controller for the Full-Information Problem

TLDR
The regret-optimal control problem can be reduced to a Nehari extension problem, i.e., to approximate an anticausal operator with a causal one in the operator norm, andSimulations over a range of plants demonstrates that the regret- optimal controller interpolates nicely between the H2 and the H∞ optimal controllers, and generally has H1 and H2 costs that are simultaneously close to their optimal values.

Regret-optimal measurement-feedback control

TLDR
It is shown that in the measurement-feedback setting, unlike in the full-information setting, there is no single offline controller which outperforms every other offline controller on every disturbance, and a new $H_2$-optimal offline controller is proposed as a benchmark for the online controller to compete against.

Regret-Optimal Filtering

TLDR
The regret-optimal estimator is the causal estimator that minimizes the worst-case regret across all bounded-energy noise sequences and is represented as a finite-dimensional state-space whose parameters can be computed by solving three Riccati equations and a single Lyapunov equation.

The Power of Linear Controllers in LQR Control

TLDR
The Linear Quadratic Regulator framework considers the problem of regulating a linear dynamical system perturbed by environmental noise and fully characterize the optimal offline policy and shows that it has a recursive form in terms of the optimal online policy and future disturbances.

Regret Bounds for the Adaptive Control of Linear Quadratic Systems

TLDR
The construction of the condence set is based on the recent results from online least-squares estimation and leads to improved worst-case regret bound for the proposed algorithm, and is the the rst time that a regret bound is derived for the LQ control problem.

Explore More and Improve Regret in Linear Quadratic Regulators

TLDR
A framework for adaptive control that exploits the characteristics of linear dynamical systems and deploys additional exploration in the early stages of agent-environment interaction to guarantee sooner design of stabilizing controllers is proposed.

Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis

TLDR
This paper designs online algorithms, Receding Horizon Gradient-based Control (RHGC), that utilize the predictions through finite steps of gradient computations, and provides a fundamental limit of the dynamic regret for any online algorithms by considering linear quadratic tracking problems.

Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

TLDR
This work presents the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem of adaptive control of the Linear Quadratic Regulator, where an unknown linear system is controlled subject to quadratic costs.

Logarithmic Regret for Online Control

TLDR
It is shown that the optimal regret in this fundamental setting can be significantly smaller, scaling as polylog(T), achieved by two different efficient iterative methods, online gradient descent and online natural gradient.

Logarithmic Regret for Adversarial Online Control

TLDR
A new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances is introduced, giving the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences, provided the state and control costs are given by known quadratic functions.