Corpus ID: 3622509

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

@inproceedings{Raileanu2018ModelingOU,
  title={Modeling Others using Oneself in Multi-Agent Reinforcement Learning},
  author={Roberta Raileanu and Emily L. Denton and Arthur D. Szlam and R. Fergus},
  booktitle={ICML},
  year={2018}
}
We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and… Expand
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
TLDR
This work proposes a modification that attempts to identify the underlying opponent model using only local information of the authors' agent, such as its observations, actions, and rewards, and indicates that the opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method. Expand
On Memory Mechanism in Multi-Agent Reinforcement Learning
TLDR
This paper shows that a memory mechanism is helpful when learning agents need to model other agents and/or when communication is constrained in some way; however, one must to be cautious of agents achieving effective memoryfulness through other means. Expand
Learning Latent Representations to Influence Multi-Agent Interaction
TLDR
This work proposes a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy and leverages these latent dynamics to influence the otherAgent, purposely guiding them towards policies suitable for co-adaptation. Expand
Opponent Modelling with Local Information Variational Autoencoders
TLDR
A new modelling technique based on variational autoencoders which uses only the local observations of the agent under control: its observed world state, chosen actions, and received rewards, which achieves significantly higher returns and comparable performance to an ideal baseline which has full access to opponent information. Expand
Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
TLDR
This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning, and methods range from modifications in the training procedure, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. Expand
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
TLDR
The results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards. Expand
Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions
TLDR
This work proposes a general method to learn representations of other agents’ policies via the joint-action distributions sampled in interactions and empirically demonstrates that this method outperforms existing work in multi-agent tasks when facing unseen agents. Expand
Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environmentExpand
Learning to Penalize Other Learning Agents
A key challenge in AI is the development of algorithms that are capable of cooperative behavior in interactions involving multiple independent machines or individuals. Of particular interest areExpand
Local Information Opponent Modelling Using Variational Autoencoders.
TLDR
A new modelling technique based on variational autoencoders, which is trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations ofThe modelling agent, and achieves comparable performance to an ideal baseline which has full access to opponent's information, and significantly higher returns than a baseline method which does not use the learned embeddeddings. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 41 REFERENCES
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Learning with Opponent-Learning Awareness
TLDR
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Expand
Opponent Modeling in Deep Reinforcement Learning
TLDR
Inspired by the recent success of deep reinforcement learning, this work presents neural-based models that jointly learn a policy and the behavior of opponents, and uses a Mixture-of-Experts architecture to encode observation of the opponents into a deep Q-Network. Expand
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
TLDR
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. Expand
Maintaining cooperation in complex social dilemmas using deep reinforcement learning
TLDR
This work shows how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice, provokable, and forgiving, and shows both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Expand
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
TLDR
A decentralized single-task learning approach that is robust to concurrent interactions of teammates is introduced, and an approach for distilling single- task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity is presented. Expand
Multi-agent Reinforcement Learning in Sequential Social Dilemmas
TLDR
This work analyzes the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network on two Markov games and characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Expand
Learning Multiagent Communication with Backpropagation
TLDR
A simple neural model is explored, called CommNet, that uses continuous communication for fully cooperative tasks and the ability of the agents to learn to communicate amongst themselves is demonstrated, yielding improved performance over non-communicative agents and baselines. Expand
Cooperative Inverse Reinforcement Learning
TLDR
It is shown that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, it is proved that optimality in isolation is suboptimal in C IRL, and an approximate CirL algorithm is derived. Expand
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand
...
1
2
3
4
5
...