# Modeling Others using Oneself in Multi-Agent Reinforcement Learning

@inproceedings{Raileanu2018ModelingOU, title={Modeling Others using Oneself in Multi-Agent Reinforcement Learning}, author={Roberta Raileanu and Emily L. Denton and Arthur D. Szlam and R. Fergus}, booktitle={ICML}, year={2018} }

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and… Expand

#### Figures and Topics from this paper

#### 93 Citations

Variational Autoencoders for Opponent Modeling in Multi-Agent Systems

- Computer Science, Mathematics
- ArXiv
- 2020

This work proposes a modification that attempts to identify the underlying opponent model using only local information of the authors' agent, such as its observations, actions, and rewards, and indicates that the opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method. Expand

On Memory Mechanism in Multi-Agent Reinforcement Learning

- Computer Science
- ArXiv
- 2019

This paper shows that a memory mechanism is helpful when learning agents need to model other agents and/or when communication is constrained in some way; however, one must to be cautious of agents achieving effective memoryfulness through other means. Expand

Learning Latent Representations to Influence Multi-Agent Interaction

- Computer Science
- ArXiv
- 2020

This work proposes a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy and leverages these latent dynamics to influence the otherAgent, purposely guiding them towards policies suitable for co-adaptation. Expand

Opponent Modelling with Local Information Variational Autoencoders

- Computer Science
- ArXiv
- 2020

A new modelling technique based on variational autoencoders which uses only the local observations of the agent under control: its observed world state, chosen actions, and received rewards, which achieves significantly higher returns and comparable performance to an ideal baseline which has full access to opponent information. Expand

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

- Computer Science, Mathematics
- ArXiv
- 2019

This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning, and methods range from modifications in the training procedure, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. Expand

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

- Computer Science
- AIIDE
- 2019

The results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards. Expand

Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions

- Computer Science
- ArXiv
- 2021

This work proposes a general method to learn representations of other agents’ policies via the joint-action distributions sampled in interactions and empirically demonstrates that this method outperforms existing work in multi-agent tasks when facing unseen agents. Expand

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

- Computer Science
- PRIMA
- 2020

We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment… Expand

Learning to Penalize Other Learning Agents

- 2021

A key challenge in AI is the development of algorithms that are capable of cooperative behavior in interactions involving multiple independent machines or individuals. Of particular interest are… Expand

Local Information Opponent Modelling Using Variational Autoencoders.

- Computer Science
- 2020

A new modelling technique based on variational autoencoders, which is trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations ofThe modelling agent, and achieves comparable performance to an ideal baseline which has full access to opponent's information, and significantly higher returns than a baseline method which does not use the learned embeddeddings. Expand

#### References

SHOWING 1-10 OF 41 REFERENCES

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

- Computer Science, Mathematics
- NIPS
- 2017

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand

Learning with Opponent-Learning Awareness

- Computer Science
- AAMAS
- 2018

Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Expand

Opponent Modeling in Deep Reinforcement Learning

- Computer Science, Mathematics
- ICML
- 2016

Inspired by the recent success of deep reinforcement learning, this work presents neural-based models that jointly learn a policy and the behavior of opponents, and uses a Mixture-of-Experts architecture to encode observation of the opponents into a deep Q-Network. Expand

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

- Computer Science
- NIPS
- 2016

By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. Expand

Maintaining cooperation in complex social dilemmas using deep reinforcement learning

- Computer Science
- ArXiv
- 2017

This work shows how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice, provokable, and forgiving, and shows both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Expand

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

- Computer Science
- ICML
- 2017

A decentralized single-task learning approach that is robust to concurrent interactions of teammates is introduced, and an approach for distilling single- task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity is presented. Expand

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

- Computer Science
- AAMAS
- 2017

This work analyzes the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network on two Markov games and characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Expand

Learning Multiagent Communication with Backpropagation

- Computer Science, Mathematics
- NIPS
- 2016

A simple neural model is explored, called CommNet, that uses continuous communication for fully cooperative tasks and the ability of the agents to learn to communicate amongst themselves is demonstrated, yielding improved performance over non-communicative agents and baselines. Expand

Cooperative Inverse Reinforcement Learning

- Computer Science
- NIPS
- 2016

It is shown that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, it is proved that optimality in isolation is suboptimal in C IRL, and an approximate CirL algorithm is derived. Expand

Apprenticeship learning via inverse reinforcement learning

- Computer Science
- ICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand