• Corpus ID: 232232836

Online Double Oracle

  title={Online Double Oracle},
  author={Le Cong Dinh and Yaodong Yang and Zheng Tian and Nicolas Perez Nieves and Oliver Slumbers and David Henry Mguni and Haitham Bou-Ammar and Jun Wang},
Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper 1 proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods from game theory. Our method – Online Double Oracle (ODO) – is provably convergent to a Nash… 

Figures and Tables from this paper

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
This work introduces a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design and is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker.
Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
This work summarizes previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD) .
On the Convergence of Fictitious Play: A Decomposition Approach
This paper derives new conditions for FP to converge by lever-aging game decomposition techniques and develops a linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable.
Efficient Policy Space Response Oracles
Theoretically, the solution procedures of EPSRO offer a monotonic improvement on the exploitability, which none of existing PSRO methods possess, and it is proved that the no-regret optimization has a regret bound of O (cid:112) T log [( k 2 + k ) / 2]) , where k is the size of restricted policy set.
Anytime Optimal PSRO for Two-Player Zero-Sum Games
Anytime Optimal Double Oracle (AODO), a tabular double oracle algorithm for 2-player zero-sum games that is guaranteed to converge to a Nash equilibrium while decreasing exploitability from iteration to iteration is proposed.
Measuring the Non-Transitivity in Chess
It is concluded that maintaining large and diverse populations of strategies is imperative to training effective AI agents for solving chess and the implications of non-transitivity for population-based training methods are investigated.
Neural Auto-Curricula
This paper introduces a novel framework—Neural Auto-Curricula (NAC)—that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design, and shows that NAC is able to generalise from small games to large games.


Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information
A no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium is developed and shown to be efficient against a large set of popular no- Regret algorithms of the row player.
Open-ended Learning in Symmetric Zero-sum Games
A geometric framework for formulating agent objectives in zero-sum games is introduced, and a new algorithm (rectified Nash response, PSRO_rN) is developed that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms.
Adaptive game playing using multiplicative weights
A variant of the game-playing algorithm is proved to be optimal in a very strong sense and a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game.
On the Rate of Convergence of Fictitious Play
It is shown that, in all the classes of games mentioned above, fictitious play may require an exponential number of rounds (in the size of the representation of the game) before some equilibrium action is eventually played.
Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
P2SRO is introduced, the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games and is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.
A General Class of Adaptive Strategies
We exhibit and characterize an entire class of simple adaptive strategies, in the repeated play of a game, having the Hannan-consistency property: In the long-run, the player is guaranteed an average
Rational and Convergent Learning in Stochastic Games
This paper introduces two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence, and contributes a new learning algorithm, WoLF policy hillclimbing, that is proven to be rational.
Improved second-order bounds for prediction with expert advice
New and sharper regret bounds are derived for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule, expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds.
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.
The Nonstochastic Multiarmed Bandit Problem
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.