# Input perturbations for adaptive control and learning

@article{Faradonbeh2018InputPF,
title={Input perturbations for adaptive control and learning},
author={Mohamad Kazem Shirani Faradonbeh and Ambuj Tewari and George Michailidis},
journal={Autom.},
year={2018},
volume={117},
pages={108950}
}
• Published 10 November 2018
• Computer Science
• Autom.

## Figures from this paper

• Computer Science
• 2021
An easy-to-implement algorithm for balancing exploration versus exploitation is proposed, followed by theoretical guarantees showing a square-root of time regret bound, and tight results for assuring system stability and for specifying fundamental limits for regret are presented.
• Mathematics
2019 IEEE 58th Conference on Decision and Control (CDC)
• 2019
It is shown that bootstrap-based policies achieve a square root scaling of regret with respect to time and results on the accuracy of learning the model’s dynamics are obtained.
• Computer Science, Mathematics
IEEE Transactions on Control of Network Systems
• 2022
This work proposes a new Thompson sampling based learning algorithm which exploits the structure of the underlying network and shows that the expected regret of the proposed algorithm is bounded by Õ ( n √ T ) where n is the number of subsystems, T is the time horizon and the Õ(·) notation hides logarithmic terms in n and T .
• Computer Science, Mathematics
NeurIPS
• 2020
The first model estimation method with finite-time guarantees in both open and closed-loop system identification and adaptive control online learning (AdaptOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps.
It is shown that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic.
This work establishes square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory, and sheds light on fundamental challenges of continuous-time reinforcement learning.
• Computer Science, Mathematics
Proc. ACM Meas. Anal. Comput. Syst.
• 2022
The crux of the algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems, that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ).
• Mathematics
• 2021
This paper proposes an online algorithm that guarantees the asymptotic optimality of the controller in the almost sure sense and proposes a safe switched control strategy that falls back to a known conservative but stable controller when the actual state deviates significantly from the target state.
• Computer Science, Mathematics
ICML
• 2020
New efficient algorithms are presented that achieve regret that scales only (poly)logarithmically with the number of steps in two scenarios: when only the state transition matrix $A$ is unknown, and when the optimal policy satisfies a certain non-degeneracy condition.

## References

SHOWING 1-10 OF 35 REFERENCES

• Computer Science, Mathematics
COLT
• 2011
The construction of the condence set is based on the recent results from online least-squares estimation and leads to improved worst-case regret bound for the proposed algorithm, and is the the rst time that a regret bound is derived for the LQ control problem.
• Mathematics
ArXiv
• 2018
A novel decomposition of adaptive policies is introduced, which establishes a sharp expression for the regret of an arbitrary policy in terms of the deviations from the optimal regulator, and shows that adaptive policies based on a slight modification of the widely used Certainty Equivalence scheme are optimal.
• Computer Science
NIPS
• 2012
This work presents an adaptive control scheme that achieves a regret bound of ${O}(p \sqrt{T})$, apart from logarithmic factors, and has prominent applications in the emerging area of computational advertising.
• Mathematics
• 1998
In adaptive control, a standard approach is to resort to the so-called certainty equivalence principle which consists of generating some standard parameter estimate and then using it in the control
• Computer Science
ICML
• 2018
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.
• Mathematics
SIAM J. Control. Optim.
• 2013
The result shows that if a certain polynomial criterion is satisfied, the system can be stabilized by feedback based on the LS algorithm for Gaussian distributed noise and unknown parameters, providing an answer to the question of what are the fundamental limitations of the discrete-time adaptive nonlinear control.
• Computer Science
• 2006
New insight is achieved in this paper by the formalization of a general cost-biased principle named "Bet On the Best"-BOB, which may work in situations in which more standard implementations of the cost-biasing idea may fail to achieve optimality.
• Mathematics, Computer Science
IEEE Transactions on Automatic Control
• 2019
Using the novel method of random linear feedbacks, high probability guarantees for finite-time stabilization of linear systems with unknown dynamics are established and held for remarkably general settings because of a minimal set of assumptions.
The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential