Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure

  title={Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure},
  author={Guannan Qu and N. Li},
  journal={2019 IEEE 58th Conference on Decision and Control (CDC)},
  • Guannan Qu, N. Li
  • Published 2019
  • Computer Science, Mathematics
  • 2019 IEEE 58th Conference on Decision and Control (CDC)
This paper considers a multi-agent Markov Decision Process (MDP), where there are n agents and each agent i is associated with a state si and action ai taking values from a finite set. Though the global state space size and action space size are exponential in n, we impose local dependence structures and focus on local policies that only depend on local states, and we propose a method that finds nearly optimal local policies in polynomial time (in n) when the dependence structure is a one… Expand
Scalable Planning in Multi-Agent MDPs
An approximate transition dependence property, called δ-transition dependence, is proposed and a metric for quantifying how far an MMDP deviates from transition independence is developed and developed and a polynomial time algorithm is developed that achieves a provable bound on the global optimum when the reward functions are monotone increasing and submodular in the agent actions. Expand
Provably Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games
We study cooperative multi-agent reinforcement learning in episodic Markov games with n agents. While the global state and action spaces typically grow exponentially in n in this setting, in thisExpand
Distributed Reinforcement Learning in Multi-Agent Networked Systems
This work proposes a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provides a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Expand
Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
This paper proposes a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Expand
Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems
A Scalable Actor-Critic (SAC) framework is proposed that exploits the network structure and finds a localized policy that is a $O(\rho^\kappa)$-approximation of a stationary point of the objective for some $\rho\in(0,1)$ with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network. Expand
Q-Learning for Mean-Field Controls
This paper develops a model-free kernel-based Q-learning algorithm (CDD-Q) and shows that its convergence rate and sample complexity are independent of the number of agents, and can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space. Expand
Multiagent Reinforcement Learning: Rollout and Policy Iteration
  • D. Bertsekas
  • Computer Science
  • IEEE/CAA Journal of Automatica Sinica
  • 2021
This paper discusses autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information, which is sufficient to maintain the cost improvement property, without any on-line coordination of control selection between the agents. Expand
A Microscopic Pandemic Simulator for Pandemic Prediction Using Scalable Million-Agent Reinforcement Learning
Microscopic epidemic models are powerful tools for government policy makers to predict and simulate epidemic outbreaks, which can capture the impact of individual behaviors on the macroscopicExpand
Socially Aware Robot Obstacle Avoidance Considering Human Intention and Preferences
The development of Co-MDP is developed: a robotic decision framework that can utilize this human model and maximize the joint utility between the human and robot; and an experimental design for evaluation of the human acceptance of obstacle avoidance algorithms. Expand
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. Expand


Distributed Policy Evaluation Under Multiple Behavior Strategies
A fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment and is efficient, with linear complexity in both computation time and memory footprint. Expand
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
This work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees, and can be implemented in an online fashion. Expand
Efficient Solution Algorithms for Factored MDPs
This paper presents two approximate solution algorithms that exploit structure in factored MDPs by using an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. Expand
Optimizing spread dynamics on graphs by message passing
It is shown that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Expand
Markov Games as a Framework for Multi-Agent Reinforcement Learning
A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated. Expand
Nash Q-Learning for General-Sum Stochastic Games
This work extends Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games, and implements an online version of Nash Q- learning that balances exploration with exploitation, yielding improved performance. Expand
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
This paper proposes a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively, and proves that the proposed algorithm converges to the optimal solution at a global geometric rate. Expand
The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems
This work distinguishes reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts, and proposes alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium. Expand
Analysis of Temporal-Diffference Learning with Function Approximation
We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm weExpand
A Comprehensive Survey of Multiagent Reinforcement Learning
The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided. Expand