Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure

@article{Qu2019ExploitingFD,
  title={Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure},
  author={Guannan Qu and N. Li},
  journal={2019 IEEE 58th Conference on Decision and Control (CDC)},
  year={2019},
  pages={6479-6486}
}
  • Guannan Qu, N. Li
  • Published 2019
  • Mathematics, Computer Science
  • 2019 IEEE 58th Conference on Decision and Control (CDC)
This paper considers a multi-agent Markov Decision Process (MDP), where there are n agents and each agent i is associated with a state si and action ai taking values from a finite set. Though the global state space size and action space size are exponential in n, we impose local dependence structures and focus on local policies that only depend on local states, and we propose a method that finds nearly optimal local policies in polynomial time (in n) when the dependence structure is a one… Expand
Scalable Planning in Multi-Agent MDPs
Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems
Q-Learning for Mean-Field Controls
Multiagent Reinforcement Learning: Rollout and Policy Iteration
  • D. Bertsekas
  • Computer Science
  • IEEE/CAA Journal of Automatica Sinica
  • 2021
Socially Aware Robot Obstacle Avoidance Considering Human Intention and Preferences
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

SHOWING 1-10 OF 46 REFERENCES
Distributed Policy Evaluation Under Multiple Behavior Strategies
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
Efficient Solution Algorithms for Factored MDPs
Optimizing spread dynamics on graphs by message passing
Markov Games as a Framework for Multi-Agent Reinforcement Learning
Nash Q-Learning for General-Sum Stochastic Games
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Actor-Critic Algorithms
Analysis of Temporal-Diffference Learning with Function Approximation
...
1
2
3
4
5
...