• Corpus ID: 219530516

Skill Discovery of Coordination in Multi-agent Reinforcement Learning

  title={Skill Discovery of Coordination in Multi-agent Reinforcement Learning},
  author={Shuncheng He and Jianzhun Shao and Xiangyang Ji},
Unsupervised skill discovery drives intelligent agents to explore the unknown environment without task-specific reward signal, and the agents acquire various skills which may be useful when the agents adapt to new tasks. In this paper, we propose "Multi-agent Skill Discovery"(MASD), a method for discovering skills for coordination patterns of multiple agents. The proposed method aims to maximize the mutual information between a latent code Z representing skills and the combination of the states… 

Figures and Tables from this paper

VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators

This paper proposes ”variational multi-agent policy diversification” (VMAPD), an on-policy framework for discovering diverse policies for coordination patterns of multiple agents and derives a tractable evidence lower bound (ELBO) on the trajectories of all agents.

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

This paper proposes Diversity-Guided Policy Optimization (DGPO 12), an on-policy framework for discovering multiple strategies for the same task, and uses diversity objectives to guide a latent code conditioned policy to learn a set of diverse strategies in a single training procedure.



Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

MAVEN: Multi-Agent Variational Exploration

A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks.

Dynamics-Aware Unsupervised Discovery of Skills

This work proposes an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics, and demonstrates that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, and substantially improves over prior hierarchical RL methods for unsuper supervised skill discovery.

Diversity is All You Need: Learning Skills without a Reward Function

The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy.

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.

Learning Multiagent Communication with Backpropagation

A simple neural model is explored, called CommNet, that uses continuous communication for fully cooperative tasks and the ability of the agents to learn to communicate amongst themselves is demonstrated, yielding improved performance over non-communicative agents and baselines.

Multi-Agent Generative Adversarial Imitation Learning

This work proposes a new framework for multi-agent imitation learning for general Markov games, where a generalized notion of inverse reinforcement learning is built upon, and introduces a practical multi- agent actor-critic algorithm with good empirical performance.

Counterfactual Multi-Agent Policy Gradients

A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients that uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Deep Variational Reinforcement Learning for POMDPs

Deep variational reinforcement learning (DVRL) is proposed, which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information.