Corpus ID: 236447468

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

@article{Jing2021AsynchronousDR,
  title={Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent},
  author={Gangshan Jing and He Bai and Jemin George and Aranya Chakrabortty and Piyush Kumar Sharma},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.12416}
}
Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the… Expand

Figures from this paper

References

SHOWING 1-10 OF 45 REFERENCES
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
TLDR
This work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees, and can be implemented in an online fashion. Expand
Zeroth-Order Stochastic Block Coordinate Type Methods for Nonconvex Optimization
TLDR
The proposed classes of zeroth-order stochastic block coordinate type methods and the first time that a two-phase BCCG method has been developed to achieve the $(\epsilon, \Lambda)$-solution of nonconvex composite optimization problem are proposed. Expand
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
TLDR
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities. Expand
A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update
TLDR
An algorithm for nonconvex optimization is proposed and its global convergence (of the whole sequence) to a critical point is established and its asymptotic convergence rate is given and numerically demonstrated. Expand
On the Exponential Number of Connected Components for the Feasible Set of Optimal Decentralized Control Problems
TLDR
A measure of problem complexity in terms of connectivity, and it is shown that there is no polynomial upper bound on the number of connected components for the set of static stabilizing decentralized controllers. Expand
ZONE: Zeroth-Order Nonconvex Multiagent Optimization Over Networks
TLDR
This paper develops efficient distributed algorithms for optimizing a class of nonconvex problems and under the challenging setting, where each agent can only access the zeroth-order information of its local functions. Expand
Distributed LQR Design for Identical Dynamically Decoupled Systems
TLDR
The design procedure proposed in this paper illustrates how stability of the large-scale system is related to the robustness of local controllers and the spectrum of a matrix representing the desired sparsity pattern of the distributed controller design problem. Expand
Improving the Convergence Rate of One-Point Zeroth-Order Optimization using Residual Feedback
TLDR
This paper proposes a novel one-point feedback scheme that queries the function value only once at each iteration and estimates the gradient using the residual between two consecutive feedback points and shows that this scheme achieves the same convergence rate as that of ZO with two- point feedback with uncontrollable data samples. Expand
Computing Stabilizing Linear Controllers via Policy Iteration
TLDR
This paper gives a model-free, off-policy reinforcement learning algorithm for computing a stabilizing controller for deterministic LQR problems with unknown dynamics and cost matrices. Expand
On the Linear Convergence of Random Search for Discrete-Time LQR
Model-free reinforcement learning techniques directly search over the parameter space of controllers. Although this often amounts to solving a nonconvex optimization problem, for benchmark controlExpand
...
1
2
3
4
5
...