• Corpus ID: 220496101

# Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

@article{Abeille2020EfficientOE,
title={Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation},
author={Marc Abeille and Alessandro Lazaric},
journal={ArXiv},
year={2020},
volume={abs/2007.06482}
}
• Published 12 July 2020
• Computer Science
• ArXiv
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of \ofulq and cast it into a constrained \textit{extended} LQR problem, where an additional control variable implicitly selects the system dynamics within a confidence interval. We then move to the corresponding Lagrangian formulation for which we prove…

## Figures from this paper

• Computer Science
ICML
• 2021
This work study task-guided exploration and determines what precisely an agent must learn about their environment in order to complete a particular task, and establishes that certainty equivalence decision making is instanceand task-optimal.
• Computer Science
NeurIPS
• 2020
This paper proposes a practical optimistic-exploration algorithm, which enlarges the input space with hallucinated inputs that can exert as much control as the epistemic uncertainty in the model affords, and shows how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms and different probabilistic models.
• Computer Science
2022 IEEE 61st Conference on Decision and Control (CDC)
• 2022
This work revisits the Thompson sampling-based learning algorithm for controlling an unknown linear system with quadratic cost and shows that a careful choice of Tmin allows it to recover the regret bound under a milder technical condition about the closed loop system.
• Computer Science
COLT
• 2022
It is shown that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs by carefully prescribing an early exploration strategy and a policy update rule, thereby solving the open problem posed in Abeille and Lazaric (2018.
• Computer Science
2022 IEEE 61st Conference on Decision and Control (CDC)
• 2022
These are the first results to generalize regret bounds of LQG systems to packet-drop networked control models.
• Computer Science
AISTATS
• 2022
This work proposes an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment with an improved exploration strategy by combining a sophisticated exploration policy in RL with an isotropic exploration strategy to achieve fast stabilization and improved regret.
• Computer Science
ArXiv
• 2022
It is shown that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs, thereby solving the open problem posed in Abeille and Lazaric (2018) and developing a novel lower bound on the probability that the TS provides an optimistic sample.
• Mathematics
2022 American Control Conference (ACC)
• 2022
Robustness aspects of certainty equivalent model-based optimal control for MJS with quadratic cost function are investigated, given the uncertainty in the system matrices and in the Markov transition matrix is bounded by and η respectively.
• Computer Science
ArXiv
• 2022
This paper presents local minimax regret lower bounds for adaptively controlling linear-quadratic-Gaussian (LQG) systems and establishes that a nontrivial class of partially observable systems, essentially those that are over-actuated, satisfy these conditions, thus providing a √ T lower bound also valid for partially observable Systems.
• Computer Science, Mathematics
L4DC
• 2021
Local asymptotic minimax regret lower bounds for adaptive Linear Quadratic Regulators are presented and it is shown that if the parametrization induces an uninformative optimal policy, logarithmic regret is impossible and the rate is at least order square root in the time horizon.

## References

SHOWING 1-10 OF 18 REFERENCES

• Computer Science, Mathematics
ICML
• 2018
A novel bound on the regret due to policy switches is obtained, which holds for LQ systems of any dimensionality and it allows updating the parameters and the policy at each step, thus overcoming previous limitations due to lazy updates.
• Computer Science, Mathematics
ICML
• 2020
New upper and lower bounds are proved demonstrating that the optimal regret scales as $\widetilde{\Theta}({\sqrt{d_{\mathbf{u}}^2 d_{\ mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{ \mathbf {u}}$ isthe dimension of the input space, and $d_x$ isTheta, the dimensions of the system state.
• Computer Science
ArXiv
• 2019
The results show that certainty equivalent control with $\varepsilon$-greedy exploration achieves $\tilde{\mathcal{O}}(\sqrt{T})$ regret in the adaptive LQR setting, yielding the first guarantee of a computationally tractable algorithm that achieves nearly optimal regret for adaptive L QR.
• Computer Science, Mathematics
COLT
• 2011
The construction of the condence set is based on the recent results from online least-squares estimation and leads to improved worst-case regret bound for the proposed algorithm, and is the the rst time that a regret bound is derived for the LQ control problem.
• Computer Science, Mathematics
NeurIPS
• 2018
This work presents the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem of adaptive control of the Linear Quadratic Regulator, where an unknown linear system is controlled subject to quadratic costs.
• Computer Science, Mathematics
ArXiv
• 2017
Finite time high probability regret bounds that are optimal up to logarithmic factors are established and high probability guarantees for a stabilization algorithm based on random linear feedbacks are provided.
• Computer Science
ArXiv
• 2018
It is shown that perturbed Greedy guarantees non-asymptotic regret bounds of (nearly) square-root magnitude w.r.t. time, and high probability bounds on both the regret and the learning accuracy under arbitrary input perturbations are established.
• Computer Science
J. Mach. Learn. Res.
• 2008
This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.
• Mathematics
2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
• 2017
It is shown under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by Õ(√T).
• Computer Science, Mathematics
ArXiv
• 2011
The regret bound of the Upper Confidence Bound algorithm of Auer et al. (2002) is improved and its regret is with high-probability a problem dependent constant, and new tighter confidence sets for the least squares estimate are constructed.