• Corpus ID: 50781772

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

@article{Kapoor2018MultiAgentRL,
  title={Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches},
  author={Sanyam Kapoor},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.09427}
}
  • Sanyam Kapoor
  • Published 25 July 2018
  • Computer Science, Mathematics
  • ArXiv
Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are… 
Self-Optimization in Smart Production Systems using Distributed Reinforcement Learning
TLDR
This paper introduces a novel approach for self-learning in highly flexible, modular manufacturing systems enabling fast reconfiguration and online adaptation to changing production requirements based on the recently developed deep deterministic policy gradient approach.
Multi-agent actor centralized-critic with communication
TLDR
A3C3 is proposed, a multi-agent actor-critic algorithm that uses a centralized critic to estimate a value function, decentralized actors to approximate each agent’s policy function, and decentralized communication networks for each agent to share relevant information with its team.
Multi-Agent Strategies for Pommerman
Despite the advances in reinforcement learning in a wide variety of applications, the domain of multi-agent systems remains vastly unexplored. In this work, we discuss strategies for approaching
Research on Tensor-Based Cooperative and Competitive in Multi-Agent Reinforcement Learning
TLDR
This research introduces tensor to store various data to resolve the challenges for data representation in multiple agent associations and introduces an algorithm that can store the training records and data of multiple agents in tensor.
Decentralized learning of energy optimal production policies using PLC-informed reinforcement learning
TLDR
This paper proposes Teacher-Student RL to distill the available control code of the individual modules into a neural network which is subsequently optimized using standard RL, a novel approach to distributed optimization in production systems using reinforcement learning with particular emphasis on energy efficient production.
Learning Mean-Field Games
TLDR
A Q-learning algorithm with Boltzmann policy (GMF-Q), with analysis of convergence property and computational complexity, is proposed for simultaneous learning and decision-making in stochastic games with a large population.
Learning Mean-Field Games
This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a
Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
TLDR
The paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution, which enhances the collaboration and increases the sum of reward values obtained by the multiagent system.
Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning
TLDR
A deep-Q-network model in a multi-agent reinforcement learning setting to guide the scheduling of multi-workflows over infrastructure-as-a-service clouds and experimental results suggest that the proposed approach outperforms traditional ones, e.g., non-dominated sorting genetic algorithm-II, multi-objective particle swarm optimization, and game-theoretic-based greedy algorithms, in terms of optimality of scheduling plans generated.
Real-time tree search with pessimistic scenarios
TLDR
A technique of tree search where a deterministic and pessimistic scenario is used after a specified depth where there is no branching with the deterministic scenario, which allows us to take into account the events that can occur far ahead in the future.
...
1
2
...

References

SHOWING 1-10 OF 39 REFERENCES
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Evolutionary Dynamics of Multi-Agent Learning: A Survey
TLDR
This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively, and provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi- agent learning.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
TLDR
A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Multi-agent reinforcement learning as a rehearsal for decentralized planning
TLDR
A novel MARL approach is proposed in which agents are allowed to rehearse with information that will not be available during policy execution, and it is shown experimentally that incorporating rehearsal features can enhance the learning rate compared to non-rehearsal-based learners.
Emergent Complexity via Multi-Agent Competition
TLDR
This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics and points out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty.
Counterfactual Multi-Agent Policy Gradients
TLDR
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Value-Decomposition Networks For Cooperative Multi-Agent Learning
TLDR
This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
TLDR
This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.
...
1
2
3
4
...