• Corpus ID: 50781772

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

@article{Kapoor2018MultiAgentRL,
  title={Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches},
  author={Sanyam Kapoor},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.09427}
}
Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are… 

Figures from this paper

Self-Optimization in Smart Production Systems using Distributed Reinforcement Learning

This paper introduces a novel approach for self-learning in highly flexible, modular manufacturing systems enabling fast reconfiguration and online adaptation to changing production requirements based on the recently developed deep deterministic policy gradient approach.

A General Framework for Learning Mean-Field Games

Experiments on an equilibrium product pricing problem demonstrate that two specific instantiations ofGMF-V with Q-learning and GMF-P with trust region policy optimization are both efficient and robust in the general mean-field game (GMFG) setting.

Learning Mean-Field Games

A Q-learning algorithm with Boltzmann policy (GMF-Q), with analysis of convergence property and computational complexity, is proposed for simultaneous learning and decision-making in stochastic games with a large population.

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

The paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution, which enhances the collaboration and increases the sum of reward values obtained by the multiagent system.

Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning

A deep-Q-network model in a multi-agent reinforcement learning setting to guide the scheduling of multi-workflows over infrastructure-as-a-service clouds and experimental results suggest that the proposed approach outperforms traditional ones, e.g., non-dominated sorting genetic algorithm-II, multi-objective particle swarm optimization, and game-theoretic-based greedy algorithms, in terms of optimality of scheduling plans generated.

Real-time tree search with pessimistic scenarios

A technique of tree search where a deterministic and pessimistic scenario is used after a specified depth where there is no branching with the deterministic scenario, which allows us to take into account the events that can occur far ahead in the future.

Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition

A technique of tree search where a deterministic and pessimistic scenario is used after a specified depth where there is no branching with the deterministic scenario, which allows us to take into account the events that can occur far ahead in the future.

Two-Stage Hybrid Network Clustering Using Multi-Agent Reinforcement Learning

The two-stage hybrid approach outperforms any methods employing single-agent reinforcement learning (SARL) and requires a fewer number of candidate broker nodes and converges faster.

References

SHOWING 1-10 OF 35 REFERENCES

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

Evolutionary Dynamics of Multi-Agent Learning: A Survey

This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively, and provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi- agent learning.

Markov Games as a Framework for Multi-Agent Reinforcement Learning

Emergent Complexity via Multi-Agent Competition

This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics and points out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty.

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.

StarCraft II: A New Challenge for Reinforcement Learning

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game that offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures and gives initial baseline results for neural networks trained from this data to predict game outcomes and player actions.