• Corpus ID: 190000138

# Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

@inproceedings{Khadka2020EvolutionaryRL,
title={Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination},
author={Shauharda Khadka and Somdeb Majumdar and Kagan Tumer},
booktitle={ICML},
year={2020}
}
• Published in ICML 18 June 2019
• Computer Science
A key challenge for Multiagent RL (Reinforcement Learning) is the design of agent-specific, local rewards that are aligned with sparse global objectives. In this paper, we introduce MERL (Multiagent Evolutionary RL), a hybrid algorithm that does not require an explicit alignment between local and global objectives. MERL uses fast, policy-gradient based learning for each agent by utilizing their dense local rewards. Concurrently, an evolutionary algorithm is used to recruit agents into a team by…
21 Citations

## Figures and Tables from this paper

Dynamic Skill Selection for Learning Joint Actions
• Computer Science
AAMAS
• 2021
MADyS, Multiagent Learning via Dynamic Skill Selection, a bi-level optimization framework that learns to dynamically switch between multiple local skills to optimize sparse team objectives outperforms prior methods and provides intuitive visualizations of its skill switching strategy.
MAEDyS: multiagent evolution via dynamic skill selection
• Computer Science
GECCO
• 2021
This work introduces MAEDyS, Multiagent Evolution via Dynamic Skill Selection, a hybrid bi-level optimization framework that augments evolutionary methods with policy gradient methods to generate effective coordination policies and shows that it outperforms prior methods.
Behavior Exploration and Team Balancing for Heterogeneous Multiagent Coordination
• Computer Science
AAMAS
• 2022
Behavior Exploration for Heterogeneous Teams is introduced, a multi-level learning framework that enables agents to progressively explore regions of the behavior space that promote team coordination on diverse goals by combining diversity search to maximize agent-specific rewards and evolutionary optimization to maximize the team-based fitness.
Balancing teams with quality-diversity for heterogeneous multiagent coordination
• Computer Science
GECCO Companion
• 2022
Behavior Exploration for Heterogeneous Teams (BEHT) is extended and it is shown that BEHT allows agents to learn diverse synergies that are demonstrated by the diversity of acquired agent behavior in response to the changing environment and agent behaviors.
Minimizing Communication while Maximizing Performance in Multi-Agent Reinforcement Learning
• Computer Science
ArXiv
• 2021
ECNet is explored, a simple method of minimizing communication penalties while maximizing a taskspecific objective in MARL and its optimization pipeline adopts REINFORCE and the Gumbel-Softmax re-parameterization trick.
Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches
• Computer Science
ArXiv
• 2021
It is suggested that, for multi agent reinforcement learning to be successful, future research addresses these challenges with an interdisciplinary approach to open up new possibilities for more human-oriented solutions in multiagent reinforcement learning.
Entropy-based local fitnesses for evolutionary multiagent systems
• Computer Science
GECCO Companion
• 2022
This paper introduces Entropy-Based Local Fitnesses (EBLFs) that generate diverse behaviors for agents and produce robust team behaviors without requiring environmental knowledge and shows that the agents using EBLFs learn new skills in difficult environments with sparse feedback without requiring domain knowledge.
A survey and critique of multiagent deep reinforcement learning
• Computer Science
Autonomous Agents and Multi-Agent Systems
• 2019
A clear overview of current multiagent deep reinforcement learning (MDRL) literature is provided to help unify and motivate future research to take advantage of the abundant literature that exists in a joint effort to promote fruitful research in the multiagent community.
Is multiagent deep reinforcement learning the answer or the question? A brief survey
• Computer Science
ArXiv
• 2018
This article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature and provides guidelines to complement this emerging area by showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and providing general lessons learned from these works.
A Generalized Training Approach for Multiagent Learning
• Computer Science
ICLR
• 2020
This paper extends the theoretical underpinnings of PSRO by considering an alternative solution concept, $\alpha$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings, and establishes convergence guarantees in several games classes.

## References

SHOWING 1-10 OF 54 REFERENCES
D++: Structural credit assignment in tightly coupled multiagent domains
• Computer Science
2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
• 2016
This paper proposes a novel reward framework based on the idea of counterfactuals to tackle the coordination problem in tightly coupled domains and shows that the proposed algorithm provides superior performance compared to policies learned using either the global reward or the difference reward.
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
• Computer Science
ArXiv
• 2017
This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.
Counterfactual Multi-Agent Policy Gradients
• Computer Science
AAAI
• 2018
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients that uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Evolution-Guided Policy Gradient in Reinforcement Learning
• Computer Science
NeurIPS
• 2018
Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into theEA population periodically to inject gradient information into the EA.
Dynamic potential-based reward shaping
• Computer Science
AAMAS
• 2012
This paper proves and demonstrates a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi- agent case.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
• Computer Science
ICML
• 2017
Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL.
Learning with Opponent-Learning Awareness
• Computer Science
AAMAS
• 2018
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.
Collaborative Evolutionary Reinforcement Learning
• Computer Science
ICML
• 2019
Collaborative Evolutionary Reinforcement Learning (CERL) is introduced, a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space and significantly outperforms its composite learners while remaining overall more sample-efficient.
Reward shaping for valuing communications during multi-agent coordination
• Computer Science
AAMAS
• 2009
This research presents a novel model of rational communication, that uses reward shaping to value communications, and employs this valuation in decentralised POMDP policy generation and an empirical evaluation of the benefits is presented in two domains.