Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments

@inproceedings{Ivanov2021BalancingRA,
  title={Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments},
  author={Dmitry Ivanov and Vladimir Egorov and Aleksei Shpilman},
  booktitle={AAMAS},
  year={2021}
}
Recent reinforcement learning studies extensively explore the interplay between cooperative and competitive behaviour in mixed environments. Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. As a consequence, purely rational agents often struggle to maintain cooperation. A prevalent approach to induce cooperative behaviour is to assign additional rewards based on other agents’ well-being… 

Figures and Tables from this paper

Evolutionary instability of selfish learning in repeated games

It is shown that FMTL is superior to selfish learning, both individually and socially, across many different social dilemmas, and further corroborate previous theoretical attempts to explain why humans take into account their impact on others when making strategic decisions.

References

SHOWING 1-10 OF 62 REFERENCES

Rainbow: Combining Improvements in Deep Reinforcement Learning

This paper examines six extensions to the DQN algorithm and empirically studies their combination, showing that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

Reducing Overestimation in Value Mixing for Cooperative Deep Multi-Agent Reinforcement Learning

This work proposes double QMIX, an end-to-end multi-agent Q-learning method with reduction of value overestimation, that trains decentralized agents’ policies in a centralized setting, and evaluates it in StarCraft II micromanagement environment to show a better performance.

Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning

This paper considers a framework for this setting in which agents have individual preferences regarding how to accomplish the shared task, and empirically shows that there exist mixing schemes that outperform a purely task-oriented baseline.

Agent57: Outperforming the Atari Human Benchmark

This work proposes Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games and trains a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative.

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The results demonstrate how the benefits of a good initialization can persist throughout learning, suggesting an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.

Dota 2 with Large Scale Deep Reinforcement Learning

By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

A survey and critique of multiagent deep reinforcement learning

A clear overview of current multiagent deep reinforcement learning (MDRL) literature is provided to help unify and motivate future research to take advantage of the abundant literature that exists in a joint effort to promote fruitful research in the multiagent community.

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

A new factorization method for MARL, QTRAN, is proposed, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions.

Understanding Contemporary Society: Theories of the Present

...