• Corpus ID: 238408283

No-Press Diplomacy from Scratch

  title={No-Press Diplomacy from Scratch},
  author={Anton Bakhtin and David J. Wu and Adam Lerer and Noam Brown},
Prior AI successes in complex games have largely focused on settings with at most hundreds of actions at each decision point. In contrast, Diplomacy is a game with more than 10 possible actions per turn. Previous attempts to address games with large branching factors, such as Diplomacy, StarCraft, and Dota, used human data to bootstrap the policy or used handcrafted reward shaping. In this paper, we describe an algorithm for action exploration and equilibrium approximation in games with… 

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

This work introduces a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and shows that using this algorithm for search in no-press Diplomacy yields a policy that matches the human prediction accuracy of imitation learning while being substantially stronger.

Player of Games

This work introduces Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning, and is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games.

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

This work shows that a single algorithm—a simple extension to mirror descent with proximal regularization that is called magnetic mirror descent (MMD)—can produce strong results in both settings, despite their fundamental differences, and proves that MMD converges linearly to QREs in extensive-form games.

A Unified Perspective on Deep Equilibrium Finding

A unified perspective on deep equilibrium finding that generalizes both PSRO and CFR is proposed and demonstrates that the approach can outperform both frameworks.


This work introduces a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and shows that using this algorithm for search in no-press Diplomacy yields a policy that matches the human prediction accuracy of imitation learning while being substantially stronger.

Efficient Φ-Regret Minimization in Extensive-Form Games via Online Mirror Descent

An improved algorithm with balancing techniques that achieves a sharp EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes is designed, which is the best knowledge, and matches the information-theoretic lower bound.

Reinforcement Learning in Practice: Opportunities and Challenges

This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical

Characterizing the Decidability of Finite State Automata Team Games with Communication

A new model of limited communication for multiplayer team games of imperfect information is proposed and it is proved that the Team DFA Game and Team Formula Game remain undecidable when players have a rate of communication which is less than the rate at which they make moves in the game.

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social



Learning to Play No-Press Diplomacy with Best Response Policy Iteration

This work considers Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions, and proposes a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves.

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

An agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via external regret minimization and achieves a rank of 23 out of 1,128 human players when playing anonymous games on a popular Diplomacy website is described.

No Press Diplomacy: Modeling Multi-Agent Gameplay

This work focuses on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players, and presents DipNet, a neural-network-based policy model for No Press Diplomacy.

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.

Mastering the game of Go without human knowledge

An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Simultaneous Abstraction and Equilibrium Finding in Games

This work introduces a method that combines abstraction with equilibrium finding by enabling actions to be added to the abstraction at run time, which allows an agent to begin learning with a coarse abstraction, and then to strategically insert actions at points that the strategy computed in the current abstraction deems important.

Grandmaster level in StarCraft II using multi-agent reinforcement learning

The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

Mastering the game of Go with deep neural networks and tree search

Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.