• Publications
  • Influence
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
TLDR
We introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. Expand
  • 212
  • 37
  • PDF
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
TLDR
Computer code based on continual problem re-solving beats human professional poker players at a two-player variant of poker. Expand
  • 383
  • 22
  • PDF
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
TLDR
We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. Expand
  • 46
  • 9
  • PDF
Probabilistic State Translation in Extensive Games with Large Action Sets
TLDR
We use probabilistic mapping to translate a real action into a probability distribution over actions, whose weights are determined by a similarity metric. Expand
  • 40
  • 7
  • PDF
Intensive Case Management Before and After Prison Release is No More Effective Than Comprehensive Pre-Release Discharge Planning in Linking HIV-Infected Prisoners to Care: A Randomized Trial
Imprisonment provides opportunities for the diagnosis and successful treatment of HIV, however, the benefits of antiretroviral therapy are frequently lost following release due to suboptimal accessExpand
  • 119
  • 6
The Hanabi Challenge: A New Frontier for AI Research
TLDR
We propose the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. Expand
  • 64
  • 6
  • PDF
Generalization and Regularization in DQN
TLDR
We evaluate the generalization capabilities of DQN, one of the most traditional deep RL algorithms in the field. Expand
  • 50
  • 6
  • PDF
Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games
In this thesis, we investigate the problem of decision-making in large two-player zero-sum games using Monte Carlo sampling and regret minimization methods. We demonstrate four major contributions.Expand
  • 32
  • 5
No-Regret Learning in Extensive-Form Games with Imperfect Recall
TLDR
In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. Expand
  • 65
  • 4
  • PDF
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
TLDR
We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. Expand
  • 71
  • 3
  • PDF