Corpus ID: 52940653

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

@inproceedings{AbbasiYadkori2019ModelFreeLQ,
  title={Model-Free Linear Quadratic Control via Reduction to Expert Prediction},
  author={Yasin Abbasi-Yadkori and Nevena Lazic and Csaba Szepesvari},
  booktitle={AISTATS},
  year={2019}
}
Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0… Expand
Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator
TLDR
A simple adaptive procedure based on $\varepsilon$-greedy exploration which relies on approximate PI as a sub-routine and obtains regret is constructed, improving upon a recent result of Abbasi-Yadkori et al. Expand
Learning the model-free linear quadratic regulator via random search
TLDR
This paper examines the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters and provides theoretical bounds on the convergence rate and sample complexity of a random search method. Expand
The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint
TLDR
This work shows that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. Expand
Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √T Regret
TLDR
This work presents the first model-free algorithm that achieves similar regret guarantees, and relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting. Expand
Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems
TLDR
The regret upper bound of O(√T) for adaptive control of linear quadratic Gaussian (LQG) systems is proved, where T is the time horizon of the problem. Expand
Using Reinforcement Learning for Model-free Linear Quadratic Control with Process and Measurement Noises
TLDR
A completely model-free reinforcement learning algorithm to solve the LQ problem where each policy is greedy with respect to all previous value functions and it is proved that the algorithm produces stable policies given that the estimation errors remain small. Expand
Average-reward model-free reinforcement learning: a systematic review and literature mapping
TLDR
An updated review of work in model-free reinforcement learning is provided and it is extended to cover policy-iteration and function approximation methods (in addition to the value-iterated and tabular counterparts) to identify and discuss opportunities for future work. Expand
Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
TLDR
This work characterizes the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random, and shows that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\ep silon)$ steps. Expand
Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems
TLDR
This work proves that the Gauss-Newton method and the natural policy gradient method converge to the optimal state feedback controller for MJLS at a linear rate if initialized at a controller which stabilizes the closed-loop dynamics in the mean square sense. Expand
Continuous Control with Contexts, Provably
TLDR
This paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments and presents a simple algorithm, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
TLDR
This work presents an adaptive control scheme that achieves a regret bound of ${O}(p \sqrt{T})$, apart from logarithmic factors, and has prominent applications in the emerging area of computational advertising. Expand
Thompson Sampling for Linear-Quadratic Control Problems
TLDR
The regret of Thompson sampling in the frequentist setting is analyzed, which results in an overall regret of O(T^{2/3})$, which is significantly worse than the regret achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems. Expand
Global Convergence of Policy Gradient Methods for Linearized Control Problems
TLDR
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities. Expand
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
TLDR
A finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept, the approximation power of thefunction set and the controllability properties of the MDP is found. Expand
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
TLDR
B batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems, when the training data consists of a single sample path (trajectory) of some behaviour policy is considered. Expand
Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator
TLDR
This work gives the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within $\varepsilon$-relative error. Expand
Regret Bounds for the Adaptive Control of Linear Quadratic Systems
TLDR
The construction of the condence set is based on the recent results from online least-squares estimation and leads to improved worst-case regret bound for the proposed algorithm, and is the the rst time that a regret bound is derived for the LQ control problem. Expand
PAC adaptive control of linear systems
TLDR
This work proposes a learning algorithm for a special case of reinforcement learning where the environment can be described by a linear system and shows that the control law produced by the algorithm has a value close to that of an optimal policy relative to the magnitude of the initial state of the system. Expand
Fast rates for online learning in Linearly Solvable Markov Decision Processes
TLDR
The smoothness of the control cost enables the simple algorithm of following the leader to achieve a regret of order $\log^2 T$ after $T$ rounds, vastly improving on the best known regret bound of order $T^{3/4}$ for this setting. Expand
Bayesian Optimal Control of Smoothly Parameterized Systems
TLDR
A lazy version of the so-called posterior sampling method, a method that goes back to Thompson and Strens, that allows for a single algorithm and a single analysis for a wide range of problems, such as finite MDPs or linear quadratic regulation. Expand
...
1
2
3
4
5
...