• Corpus ID: 238408283

No-Press Diplomacy from Scratch

@inproceedings{Bakhtin2021NoPressDF,
  title={No-Press Diplomacy from Scratch},
  author={Anton Bakhtin and David J. Wu and Adam Lerer and Noam Brown},
  booktitle={NeurIPS},
  year={2021}
}
Prior AI successes in complex games have largely focused on settings with at most hundreds of actions at each decision point. In contrast, Diplomacy is a game with more than 10 possible actions per turn. Previous attempts to address games with large branching factors, such as Diplomacy, StarCraft, and Dota, used human data to bootstrap the policy or used handcrafted reward shaping. In this paper, we describe an algorithm for action exploration and equilibrium approximation in games with… 
Modeling Strong and Human-Like Gameplay with KL-Regularized Search
TLDR
This work introduces a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and shows that using this algorithm for search in no-press Diplomacy yields a policy that matches the human prediction accuracy of imitation learning while being substantially stronger.
Player of Games
TLDR
This work introduces Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning, and is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games.
A Unified Perspective on Deep Equilibrium Finding
TLDR
A unified perspective on deep equilibrium finding that generalizes both PSRO and CFR is proposed and demonstrates that the approach can outperform both frameworks.
M ODELING S TRONG AND H UMAN - LIKE G AMEPLAY WITH KL-R EGULARIZED S EARCH
TLDR
This work introduces a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and shows that using this algorithm for search in no-press Diplomacy yields a policy that matches the human prediction accuracy of imitation learning while being substantially stronger.
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
TLDR
This work shows that a single algorithm—a simple extension to mirror descent with proximal regularization that is called magnetic mirror descent (MMD)—can produce strong results in both settings, despite their fundamental differences, and proves that MMD converges linearly to QREs in extensive-form games.
Efficient Φ-Regret Minimization in Extensive-Form Games via Online Mirror Descent
TLDR
An improved algorithm with balancing techniques that achieves a sharp EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes is designed, which is the best knowledge, and matches the information-theoretic lower bound.
Reinforcement Learning in Practice: Opportunities and Challenges
This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical

References

SHOWING 1-10 OF 34 REFERENCES
Human-Level Performance in No-Press Diplomacy via Equilibrium Search
TLDR
An agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via external regret minimization and achieves a rank of 23 out of 1,128 human players when playing anonymous games on a popular Diplomacy website is described.
No Press Diplomacy: Modeling Multi-Agent Gameplay
TLDR
This work focuses on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players, and presents DipNet, a neural-network-based policy model for No Press Diplomacy.
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
TLDR
Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
TLDR
This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
Mastering the game of Go without human knowledge
TLDR
An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
Simultaneous Abstraction and Equilibrium Finding in Games
TLDR
This work introduces a method that combines abstraction with equilibrium finding by enabling actions to be added to the abstraction at run time, which allows an agent to begin learning with a coarse abstraction, and then to strategically insert actions at points that the strategy computed in the current abstraction deems important.
Grandmaster level in StarCraft II using multi-agent reinforcement learning
TLDR
The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
Mastering the game of Go with deep neural networks and tree search
TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
TLDR
P2SRO is introduced, the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games and is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.
Nash Q-Learning for General-Sum Stochastic Games
TLDR
This work extends Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games, and implements an online version of Nash Q- learning that balances exploration with exploitation, yielding improved performance.
...
...