Policy Distillation and Value Matching in Multiagent Reinforcement Learning

  title={Policy Distillation and Value Matching in Multiagent Reinforcement Learning},
  author={Samir Wadhwania and Dong-Ki Kim and Shayegan Omidshafiei and Jonathan P. How},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  • Samir Wadhwania, Dong-Ki Kim, +1 author J. How
  • Published 15 March 2019
  • Computer Science, Mathematics
  • 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Multiagent reinforcement learning (MARL) algorithms have been demonstrated on complex tasks that require the coordination of a team of multiple agents to complete. Existing works have focused on sharing information between agents via centralized critics to stabilize learning or through communication to improve performance, but do not generally consider how information can be shared between agents to address the curse of dimensionality in MARL. We posit that a multiagent problem can be… 
An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
A novel Multiagent Policy Transfer Framework (MAPTF) is proposed to improve MARL efficiency and can be easily combined with existing deep RL and MARL approaches, and experimental results show it significantly boosts the performance of existing methods in both discrete and continuous state spaces.
KnowRU: Knowledge Reuse via Knowledge Distillation in Multi-Agent Reinforcement Learning
This paper proposes a method, named “KnowRU” for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design, and demonstrates the robustness and effectiveness of this method.
KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning
This paper presents an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called “KnowSR” which takes advantage of the differences in learning between agents, and employs the idea of knowledge distillation (KD) to share knowledge among agents to shorten the training phase.
AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment
This work introduces an algorithm called Attention and BicNet based Multiagent path planning with effective reinforcement (AB-Mapper) under the actor-critic reinforcement learning framework, and proposes a centralized critic network that can selectively allocate attention weights to surrounding agents.
Active collaboration in relative observation for multi-agent visual simultaneous localization and mapping based on Deep Q Network
This article proposes a unique active relative localization mechanism for multi-agent simultaneous localization and mapping, in which an agent to be observed is considered as a task, and the others
Learning When to Transfer among Agents: An Efficient Multiagent Transfer Learning Framework
A novel multi-agent transfer learning framework that significantly accelerates the learning process and surpasses state-of-the-art deep RL methods in terms of learning efficiency and final performance in both discrete and continuous action spaces is proposed.


Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Simultaneously Learning and Advising in Multiagent Reinforcement Learning
A multiagent advising framework where multiple agents can advise each other while learning in a shared environment is proposed and it is shown that the learning process is improved by incorporating this kind of advice.
Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning
The proposed framework solves difficulties faced by prior work on multiagent teaching when operating in domains with long horizons, delayed rewards, and continuous states/actions by leveraging temporal abstraction and deep function approximation.
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
A decentralized single-task learning approach that is robust to concurrent interactions of teammates is introduced, and an approach for distilling single- task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity is presented.
Coordinated Multi-Agent Imitation Learning
It is shown that having a coordination model to infer the roles of players yields substantially improved imitation loss compared to conventional baselines, and the method integrates unsupervised structure learning with conventional imitation learning.
Cooperative Multi-Agent Learning: The State of the Art
This survey attempts to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics, and finds that this broad view leads to a division of the work into two categories.
Learning Multiagent Communication with Backpropagation
A simple neural model is explored, called CommNet, that uses continuous communication for fully cooperative tasks and the ability of the agents to learn to communicate amongst themselves is demonstrated, yielding improved performance over non-communicative agents and baselines.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Counterfactual Multi-Agent Policy Gradients
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Learning to Teach in Cooperative Multiagent Reinforcement Learning
Empirical comparisons against state-of-the-art teaching methods show that the teaching agents in LeCTR not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.