Input perturbations for adaptive control and learning

  title={Input perturbations for adaptive control and learning},
  author={Mohamad Kazem Shirani Faradonbeh and Ambuj Tewari and George Michailidis},

Figures from this paper

Adaptive Control of Quadratic Costs in Linear Stochastic Differential Equations

An easy-to-implement algorithm for balancing exploration versus exploitation is proposed, followed by theoretical guarantees showing a square-root of time regret bound, and tight results for assuring system stability and for specifying fundamental limits for regret are presented.

On Applications of Bootstrap in Continuous Space Reinforcement Learning

It is shown that bootstrap-based policies achieve a square root scaling of regret with respect to time and results on the accuracy of learning the model’s dynamics are obtained.

Joint Learning-Based Stabilization of Multiple Unknown Linear Systems

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

This work proposes a new Thompson sampling based learning algorithm which exploits the structure of the underlying network and shows that the expected regret of the proposed algorithm is bounded by Õ ( n √ T ) where n is the number of subsystems, T is the time horizon and the Õ(·) notation hides logarithmic terms in n and T .

Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

The first model estimation method with finite-time guarantees in both open and closed-loop system identification and adaptive control online learning (AdaptOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps.

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

It is shown that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic.

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

This work establishes square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory, and sheds light on fundamental challenges of continuous-time reinforcement learning.

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

The crux of the algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems, that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ).

Safe Linear-Quadratic Dual Control with Almost Sure Performance Guarantee

This paper proposes an online algorithm that guarantees the asymptotic optimality of the controller in the almost sure sense and proposes a safe switched control strategy that falls back to a known conservative but stable controller when the actual state deviates significantly from the target state.

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

New efficient algorithms are presented that achieve regret that scales only (poly)logarithmically with the number of steps in two scenarios: when only the state transition matrix $A$ is unknown, and when the optimal policy satisfies a certain non-degeneracy condition.



Regret Bounds for the Adaptive Control of Linear Quadratic Systems

The construction of the condence set is based on the recent results from online least-squares estimation and leads to improved worst-case regret bound for the proposed algorithm, and is the the rst time that a regret bound is derived for the LQ control problem.

On Optimality of Adaptive Linear-Quadratic Regulators

A novel decomposition of adaptive policies is introduced, which establishes a sharp expression for the regret of an arbitrary policy in terms of the deviations from the optimal regulator, and shows that adaptive policies based on a slight modification of the widely used Certainty Equivalence scheme are optimal.

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

This work presents an adaptive control scheme that achieves a regret bound of ${O}(p \sqrt{T})$, apart from logarithmic factors, and has prominent applications in the emerging area of computational advertising.

Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited

In adaptive control, a standard approach is to resort to the so-called certainty equivalence principle which consists of generating some standard parameter estimate and then using it in the control

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.

Stabilization of Discrete-Time Nonlinear Uncertain Systems by Feedback Based on LS Algorithm

The result shows that if a certain polynomial criterion is satisfied, the system can be stabilized by feedback based on the LS algorithm for Gaussian distributed noise and unknown parameters, providing an answer to the question of what are the fundamental limitations of the discrete-time adaptive nonlinear control.


New insight is achieved in this paper by the formalization of a general cost-biased principle named "Bet On the Best"-BOB, which may work in situations in which more standard implementations of the cost-biasing idea may fail to achieve optimality.

Finite-Time Adaptive Stabilization of Linear Systems

Using the novel method of random linear feedbacks, high probability guarantees for finite-time stabilization of linear systems with unknown dynamics are established and held for remarkably general settings because of a minimal set of assumptions.

Dynamic Programming and Optimal Control

The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential