Preference Communication in Multi-Objective Normal-Form Games

@article{Rpke2021PreferenceCI,
  title={Preference Communication in Multi-Objective Normal-Form Games},
  author={Willem R{\"o}pke and Diederik M. Roijers and Ann Now'e and Roxana Rădulescu},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.09191}
}
We study the problem of multiple agents learning concurrently in a multi-objective environment. Specifically, we consider two agents that repeatedly play a multi-objective normal-form game. In such games, the payoffs resulting from joint actions are vector valued. Taking a utility-based approach, we assume a utility function exists that maps vectors to scalar utilities and consider agents that aim to maximise the utility of expected payoff vectors. As agents do not necessarily know their… 

Bridging the Gap Between Single and Multi Objective Games

This work bridges the gap between the two models of multi-objective normal-form games by providing a theoretical guarantee that a game from one setting can always be transformed to a game in the other, and extends the theoretical results to include guaranteed equivalence of Nash equilibria.

On nash equilibria in normal-form games with vectorial payoffs

It is shown that when assuming quasiconvex utility functions for players, the sets of pure strategy Nash equilibria under both optimisation criteria are equivalent, and this result is further extended to games in which players adhere to different Optimisation criteria.

References

SHOWING 1-10 OF 68 REFERENCES

A Survey of Multi-Objective Sequential Decision-Making

This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.

Emergent Communication under Competition

A modified sender-receiver game is introduced to study the spectrum of partially-competitive scenarios and it is shown that communication is proportional to cooperation, and it can occur for partially competitive scenarios using standard learning algorithms.

Multiagent learning using a variable learning rate

Opponent Modelling for Reinforcement Learning in Multi-Objective Normal Form Games

A novel actor-critic formulation is contributed to allow reinforcement learning of mixed strategies in multi-objective multi-agent interactions with non-linear utilities under the scalarised expected returns optimisation criterion.

A utility-based analysis of equilibria in multi-objective normal-form games

It is demonstrated that the choice of optimization criterion (ESR or SER) can radically alter the set of equilibria in a MONFG when nonlinear utility functions are used.

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

This work considers Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions, and proposes a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves.

Reinforcement learning: An introduction. MIT press, second edition, 2018

  • arXiv Template A PREPRINT
  • 2018

Opponent learning awareness and modelling in multi-objective normal form games

This work considers two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion and contributes novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting.

On the value of commitment

In game theory, it is well known that being able to commit to a strategy before other players move can be beneficial. In this paper, we analyze how much benefit a player can derive from commitment in

On nash equilibria in normal-form games with vectorial payoffs

It is shown that when assuming quasiconvex utility functions for players, the sets of pure strategy Nash equilibria under both optimisation criteria are equivalent, and this result is further extended to games in which players adhere to different Optimisation criteria.
...