• Corpus ID: 3330300

Multi-Agent Generative Adversarial Imitation Learning

@inproceedings{Song2018MultiAgentGA,
  title={Multi-Agent Generative Adversarial Imitation Learning},
  author={Jiaming Song and Hongyu Ren and Dorsa Sadigh and Stefano Ermon},
  booktitle={NeurIPS},
  year={2018}
}
Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal. However, most existing approaches are not applicable in multi-agent settings due to the existence of multiple (Nash) equilibria and non-stationary environments. We propose a new framework for multi-agent imitation learning for general Markov games, where we build upon a generalized notion of inverse reinforcement learning. We further introduce a practical multi-agent actor… 

Figures and Tables from this paper

Multi-Agent Adversarial Inverse Reinforcement Learning
TLDR
MA-AIRL is proposed, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics, and significantly outperforms prior methods in terms of policy imitation.
Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems
TLDR
This work is the first to combine self-imitation learning with generative adversarial imitation learning (GAIL) and apply it to cooperative multiagent systems and produces state-of-the-art results and even outperforms JALs in terms of both convergence speed and final performance.
Imitation Learning From Inconcurrent Multi-Agent Interactions
TLDR
The experiment results demonstrate that compared to state-of-the-art baselines, the iMA-IL model can better infer the policy of each expert agent using their demonstration data collected from inconcurrent decision-making scenarios.
Conditional Imitation Learning for Multi-Agent Games
TLDR
A model that learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace, and proposes a novel approach to address the difficulties of scalability and data scarcity.
2 . 3 Inverse Reinforcement Learning and Imitation Learning
TLDR
Two methods which apply forms of imitation learning to the problem of learning coordinated behaviors have a close connection to multiagent actor-critic models, and will avoid relative overgeneralization if the right demonstrations are given.
Sample-efficient Adversarial Imitation Learning from Observation
TLDR
An algorithm is proposed that addresses the sample inefficiency problem by utilizing ideas from trajectory centric reinforcement learning algorithms and will show the improvement in learning rate and efficiency.
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic
TLDR
A multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works, and able to increase sample efficiency compared to state-of-the-art baselines, across both small- and large-scale tasks.
Multi-Agent Imitation Learning with Copulas
TLDR
The proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
Simulating Emergent Properties of Human Driving Behavior Using Multi-Agent Reward Augmented Imitation Learning
TLDR
It is proved that convergence guarantees for the imitation learning process are preserved under the application of reward augmentation, and improved performance is demonstrated in comparison to traditional imitation learning algorithms both in terms of the local actions of a single agent and the behavior of emergent properties in complex, multi-agent settings.
Multi-Agent Imitation Learning for Driving Simulation
TLDR
Compared with single-agent GAIL policies, policies generated by the PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 77 REFERENCES
Generative Adversarial Imitation Learning
TLDR
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Coordinated Multi-Agent Imitation Learning
TLDR
It is shown that having a coordination model to infer the roles of players yields substantially improved imitation loss compared to conventional baselines, and the method integrates unsupervised structure learning with conventional imitation learning.
Inverse Reinforcement Learning in Swarm Systems
TLDR
This paper introduces the swarMDP framework, a sub-class of decentralized partially observable Markov decision processes endowed with a swarm characterization, and proposes a novel heterogeneous learning scheme that is particularly tailored to the swarm setting.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
TLDR
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Third-Person Imitation Learning
TLDR
The methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process.
Efficient Reductions for Imitation Learning
TLDR
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
TLDR
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Maximum Entropy Inverse Reinforcement Learning
TLDR
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.
...
1
2
3
4
5
...