• Corpus ID: 245650882

Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms

  title={Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms},
  author={Adrian Redder and Arunselvan Ramaswamy and Holger Karl},
We present sufficient conditions that ensure convergence of the multiagent Deep Deterministic Policy Gradient (DDPG) algorithm. It is an example of one of the most popular paradigms of Deep Reinforcement Learning (DeepRL) for tackling continuous action spaces: the actor-critic paradigm. In the setting considered herein, each agent observes a part of the global state space in order to take local actions, for which it receives local rewards. For every agent, DDPG trains a local actor (policy) and… 

Figures from this paper

Multi-agent Policy Gradient Algorithms for Cyber-physical Systems with Lossy Communication
This work presents the decentralization of the well known deep deterministic policy gradient algorithm using a communication network, and illustrates the convergence of the algorithm and the effect of lossy communication on the rate of convergence for a two-agent flow control problem, where the agents exchange their local information over a delaying wireless network.
Distributed gradient-based optimization in the presence of dependent aperiodic communication
It is shown that convergence is guaranteed provided the random variables associated with the AoI processes are stochastically dominated by a random variable with finite first moment, which improves on previous requirements of boundedness of more than the first moment.


Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
This analysis establishes the first global optimality and convergence guarantees for neural policy gradient methods by relating the suboptimality of the stationary points to the representation power of neural actor and critic classes and proving the global optimability of all stationary points under mild regularity conditions.
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
Deep Q-Learning: Theoretical Insights From an Asymptotic Analysis
This work provides a theoretical analysis of a popular version of deep Q-learning under realistic and verifiable assumptions and proves an important result on the convergence of the algorithm, characterizing the asymptotic behavior of the learning process.
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Practical sufficient conditions for convergence of distributed optimisation algorithms over communication networks with interference
The objective is to formulate a representative network model and provide practically verifiable network conditions that ensure convergence of distributed algorithms in the presence of interference and possibly unbounded delay and to show that a penalty-based gradient descent algorithm can be used to solve a rich class of stochastic, constrained, distributed optimisation problems.
The actor-critic algorithm as multi-time-scale stochastic approximation
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time scale stochastic approximation. Convergence analysis, approximation
Searching for Activation Functions
The experiments show that the best discovered activation function, f(x) = x \cdot \text{sigmoid}(\beta x)$, which is named Swish, tends to work better than ReLU on deeper models across a number of challenging datasets.
The Complexity of Decentralized Control of Markov Decision Processes
This work considers decentralized control of Markov decision processes and gives complexity bounds on the worst-case running time for algorithms that find optimal solutions and describes generalizations that allow for decentralized control.
Gaussian Error Linear Units (GELUs)
An empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations is performed and performance improvements are found across all considered computer vision, natural language processing, and speech tasks.