Corpus ID: 203641953

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

@article{Preiss2019AnalyzingTV,
  title={Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator},
  author={James A. Preiss and S{\'e}bastien M. R. Arnold and Chen-Yu Wei and M. Kloft},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.01249}
}
We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments. 
Decentralized Policy Gradient Method for Mean-Field Linear Quadratic Regulator with Global Convergence
The scalability of multi-agent reinforcement learning methods to a large number of population is drawing more and more attention in both practice and theory. We consider the basic yet importantExpand
Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach
TLDR
This paper considers a dynamical system with both linear and non-linear components and develops a novel approach to use the linear model to define a warm start for a model-free, policy gradient method, and shows this hybrid approach outperforms the model-based controller while avoiding the convergence issues. Expand
Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation
TLDR
The benchmark problem of discrete-time linear quadratic regulation (LQR) is revisited and it is shown that policy iteration for LQR is inherently robust to small errors and enjoys local input-to-state stability. Expand

References

SHOWING 1-10 OF 18 REFERENCES
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
TLDR
This paper considers variance reduction methods that were developed for Monte Carlo estimates of integrals, and gives bounds for the estimation error of the gradient estimates for both baseline and actor-critic algorithms, in terms of the sample size and mixing properties of the controlled system. Expand
Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
TLDR
This work characterizes the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random, and shows that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\ep silon)$ steps. Expand
The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint
TLDR
This work shows that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. Expand
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
TLDR
It is proved that actor-critic finds a globally optimal pair of actor and critic at a linear rate of convergence, which may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics. Expand
Trust Region Policy Optimization
TLDR
A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO). Expand
Global Convergence of Policy Gradient Methods for Linearized Control Problems
TLDR
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
A Tour of Reinforcement Learning: The View from Continuous Control
  • B. Recht
  • Computer Science, Mathematics
  • Annual Review of Control, Robotics, and Autonomous Systems
  • 2019
This article surveys reinforcement learning from the perspective of optimization and control, with a focus on continuous control applications. It reviews the general formulation, terminology, andExpand
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
TLDR
This article will discuss how a generalization of the reinforcement learning or optimal control problem, which is sometimes termed maximum entropy reinforcement learning, is equivalent to exact probabilistic inference in the case of deterministic dynamics, and variational inference inThe case of stochastic dynamics. Expand
Linear Optimal Control Systems
TLDR
This book attempts to reconcile modern linear control theory with classical control theory by presenting design methods, employing modern techniques, for obtaining control systems that stand up to the requirements that have been so well developed in the classical expositions of control theory. Expand
...
1
2
...