Policy Distillation and Value Matching in Multiagent Reinforcement Learning

@article{Wadhwania2019PolicyDA,
  title={Policy Distillation and Value Matching in Multiagent Reinforcement Learning},
  author={Samir Wadhwania and Dong-Ki Kim and Shayegan Omidshafiei and Jonathan P. How},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2019},
  pages={8193-8200}
}
  • Samir Wadhwania, Dong-Ki Kim, +1 author J. How
  • Published 15 March 2019
  • Computer Science, Mathematics
  • 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Multiagent reinforcement learning (MARL) algorithms have been demonstrated on complex tasks that require the coordination of a team of multiple agents to complete. Existing works have focused on sharing information between agents via centralized critics to stabilize learning or through communication to improve performance, but do not generally consider how information can be shared between agents to address the curse of dimensionality in MARL. We posit that a multiagent problem can be… Expand
An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
  • Tianpei Yang, Weixun Wang, +9 authors Changjie Fan
  • Computer Science
  • 2020
TLDR
A novel Multiagent Policy Transfer Framework (MAPTF) is proposed to improve MARL efficiency and can be easily combined with existing deep RL and MARL approaches, and experimental results show it significantly boosts the performance of existing methods in both discrete and continuous state spaces. Expand
KnowRU: Knowledge Reuse via Knowledge Distillation in Multi-Agent Reinforcement Learning
TLDR
This paper proposes a method, named “KnowRU” for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design, and demonstrates the robustness and effectiveness of this method. Expand
KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning
TLDR
This paper presents an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called “KnowSR” which takes advantage of the differences in learning between agents, and employs the idea of knowledge distillation (KD) to share knowledge among agents to shorten the training phase. Expand
AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment
TLDR
This work introduces an algorithm called Attention and BicNet based Multiagent path planning with effective reinforcement (AB-Mapper) under the actor-critic reinforcement learning framework, and proposes a centralized critic network that can selectively allocate attention weights to surrounding agents. Expand
Active collaboration in relative observation for multi-agent visual simultaneous localization and mapping based on Deep Q Network
This article proposes a unique active relative localization mechanism for multi-agent simultaneous localization and mapping, in which an agent to be observed is considered as a task, and the othersExpand
Learning When to Transfer among Agents: An Efficient Multiagent Transfer Learning Framework
TLDR
A novel multi-agent transfer learning framework that significantly accelerates the learning process and surpasses state-of-the-art deep RL methods in terms of learning efficiency and final performance in both discrete and continuous action spaces is proposed. Expand

References

SHOWING 1-10 OF 37 REFERENCES
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Simultaneously Learning and Advising in Multiagent Reinforcement Learning
TLDR
A multiagent advising framework where multiple agents can advise each other while learning in a shared environment is proposed and it is shown that the learning process is improved by incorporating this kind of advice. Expand
Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning
TLDR
The proposed framework solves difficulties faced by prior work on multiagent teaching when operating in domains with long horizons, delayed rewards, and continuous states/actions by leveraging temporal abstraction and deep function approximation. Expand
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
TLDR
A decentralized single-task learning approach that is robust to concurrent interactions of teammates is introduced, and an approach for distilling single- task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity is presented. Expand
Coordinated Multi-Agent Imitation Learning
TLDR
It is shown that having a coordination model to infer the roles of players yields substantially improved imitation loss compared to conventional baselines, and the method integrates unsupervised structure learning with conventional imitation learning. Expand
Cooperative Multi-Agent Learning: The State of the Art
TLDR
This survey attempts to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics, and finds that this broad view leads to a division of the work into two categories. Expand
Learning Multiagent Communication with Backpropagation
TLDR
A simple neural model is explored, called CommNet, that uses continuous communication for fully cooperative tasks and the ability of the agents to learn to communicate amongst themselves is demonstrated, yielding improved performance over non-communicative agents and baselines. Expand
Markov Games as a Framework for Multi-Agent Reinforcement Learning
TLDR
A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated. Expand
Counterfactual Multi-Agent Policy Gradients
TLDR
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. Expand
Learning to Teach in Cooperative Multiagent Reinforcement Learning
TLDR
Empirical comparisons against state-of-the-art teaching methods show that the teaching agents in LeCTR not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail. Expand
...
1
2
3
4
...