• Corpus ID: 238856844

HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism

  title={HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism},
  author={Zhiwei Xu and Yunpeng Bai and Bin Zhang and Dapeng Li and Guoliang Fan},
Multi-agent reinforcement learning often suffers from the exponentially large action space caused by a large number of agents. This paper proposes a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems. To address the instability that arises from the concurrent optimization of high-level and low-level policies and another concurrent optimization of agents, we introduce the dual coordination mechanism of inter-layer… 

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states, which makes agents have foresight under any multiagent value decomposition method.

Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning

A set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three decentralized learning, centralized learning, and centralized training for decentralized execution paradigms are formulated.



Value-Decomposition Networks For Cooperative Multi-Agent Learning

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms and a new experience replay mechanism is proposed to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning.

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

This work presents an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep, which enables more effective and scalable learning in complex multi- agent environments, when compared to recent approaches.

Inter-Level Cooperation in Hierarchical Reinforcement Learning

It is hypothesized that improved cooperation between the internal agents of a hierarchy can simplify the credit assignment problem from the perspective of the high-level policies, thereby leading to significant improvements to training in situations where intricate sets of action primitives must be performed to yield improvements in performance.

Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning

It is found that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function.

MAVEN: Multi-Agent Variational Exploration

A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks.

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

A new factorization method for MARL, QTRAN, is proposed, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions.

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

A two-level hierarchical multi-agent reinforcement learning (MARL) algorithm with unsupervised skill discovery that enables the emergence of useful skills and cooperative team play and the interpretability of the learned skills show the promise of the proposed method for achieving human-AI cooperation in team sports games.

QPLEX: Duplex Dueling Multi-Agent Q-Learning

A novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function and encodes the IGM principle into the neural network architecture and thus enables efficient value function learning.