• Corpus ID: 216035997

Approximate exploitability: Learning a best response in large games

  title={Approximate exploitability: Learning a best response in large games},
  author={Finbarr Timbers and Edward Lockhart and Martin Schmid and Marc Lanctot and Michael H. Bowling},
A common metric in games of imperfect information is exploitability, i.e. the performance of a policy against the worst-case opponent. This metric has many nice properties, but is intractable to compute in large games as it requires a full search of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric to exploitability using an approximate best response. This method scales to large games with… 

Figures and Tables from this paper



Et al

A large population-based survey of veterans and nondeployed controls found evidence of a deployment-related Gulf War syndrome by factor analysis in Air Force veterans and controls.

OpenSpiel: A Framework for Reinforcement Learning in Games

This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search.

Eqilibrium Approximation Quality of Current No-Limit Poker Bots

This paper presents a simple and computationally inexpensive Local Best Response method for computing an approximate lower bound on the value of the best response strategy, and shows that existing poker-playing programs are remarkably poor Nash equilibrium approximations.

Solving Heads-Up Limit Texas Hold'em

The engineering details required to make Cepheus solve heads-up limit Texas hold'em poker are described in detail and the theoretical soundness of CFR+ and its component algorithm, regret-matching + is proved.

Information Set Monte Carlo Tree Search

Three new information set MCTS (ISMCTS) algorithms are presented which handle different sources of hidden information and uncertainty in games, instead of searching minimax trees of game states, the ISMCTS algorithms search trees of information sets, more directly analyzing the true structure of the game.

Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing

Slumbot NL is a heads-up no-limit hold'em poker bot built with a distributed disk-based implementation of counterfactual regret minimization (CFR), enabling it to solve a large abstraction on commodity hardware in a cost-effective fashion.

PACHI: State of the Art Open Source Go Program

A state of the art implementation of the Monte Carlo Tree Search algorithm for the game of Go and three notable original improvements: an adaptive time control algorithm, dynamic komi, and the usage of the criticality statistic are described.

Deep Blue


  • and Go through selfplay. Science, 632(6419):1140–1144,
  • 2018

and Martin M

  • Zinkevich. Accelerating best response calculation in large extensive games. In IJCAI
  • 2011