• Publications
  • Influence
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
This paper takes a big picture look at how the ALE is being used by the research community and focuses on how diverse the evaluation methodologies in the ALE have become and highlights some key concerns when evaluating agents in this platform.
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
DeepStack is introduced, an algorithm for imperfect-information settings that combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning.
The Hanabi Challenge: A New Frontier for AI Research
It is argued that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground and developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners.
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
The Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment are met, is presented.
Count-Based Exploration with the Successor Representation
A simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required and achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime.
Probabilistic State Translation in Extensive Games with Large Action Sets
Equilibrium or near-equilibrium solutions to very large extensive form games are often computed by using abstractions to reduce the game size. A common abstraction technique for games with a large
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
This paper examines the role of policy gradient and actor-critic algorithms in partially-observable multiagent environments and relates them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees.
Generalization and Regularization in DQN
Despite regularization being largely underutilized in deep RL, it is shown that it can, in fact, help DQN learn more general features and can then be reused and fine-tuned on similar tasks, considerably improving the sample efficiency of D QN.
Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games
In this thesis, we investigate the problem of decision-making in large two-player zero-sum games using Monte Carlo sampling and regret minimization methods. We demonstrate four major contributions.
No-Regret Learning in Extensive-Form Games with Imperfect Recall
This paper presents the first regret bound for CFR when applied to a general class of games with imperfect recall and shows how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.