Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

@article{Schrittwieser2020MasteringAG,
  title={Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model},
  author={Julian Schrittwieser and Ioannis Antonoglou and Thomas Hubert and Karen Simonyan and L. Sifre and Simon Schmitt and Arthur Guez and Edward Lockhart and Demis Hassabis and Thore Graepel and Timothy P. Lillicrap and David Silver},
  journal={Nature},
  year={2020},
  volume={588 7839},
  pages={
          604-609
        }
}
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman… Expand

Figures and Tables from this paper

Continuous Control for Searching and Planning with a Learned Model
TLDR
This paper provides a way and the necessary theoretical results to extend the MuZero algorithm to more generalized environments with continuous action space and shows the proposed algorithm outperforms the soft actor-critic (SAC) algorithm, a state-of-the-art model-free deep reinforcement learning algorithm. Expand
Learning to Play: Reinforcement Learning and Games
  • A. Plaat
  • Computer Science, Psychology
  • 2020
TLDR
It is shown that the methods generalize to three games, hinting at artificial general intelligence, and an argument can be made that in doing so the authors failed the Turing test, since no human can play at this level. Expand
PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
TLDR
This work proposes PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards, and indicates that it can achieve comparable performance whilst being around 4-8 times more sample efficient. Expand
Exploiting Bias for Cooperative Planning in Multi-Agent Tree Search
TLDR
A novel method of tree search based on a mixture of the individual and joint action space, which can be interpreted as a cascading effect where agents are biased by exploration of new actions, exploitation of previously profitable ones, and recommendation provided by deep neural nets is introduced. Expand
Hierarchical Width-Based Planning and Learning
TLDR
This paper presents a hierarchical algorithm that plans at two levels of abstraction, a high-level planner uses abstract features that are incrementally discovered from low-level pruning decisions, and illustrates this algorithm in classical planning PDDL domains as well as in pixelbased simulator domains. Expand
On the role of planning in model-based deep reinforcement learning
TLDR
This paper studies the performance of MuZero, a state-of-the-art model-based reinforcement learning algorithm with strong connections and overlapping components with many other MBRL algorithms, and suggests that planning alone is insufficient to drive strong generalization. Expand
Bootstrapped model learning and error correction for planning with uncertainty in model-based RL
TLDR
This paper proposes a bootstrapped multi-headed neural network that learns the distribution of future states and rewards and introduces a global error correction filter that applies high-level constraints guided by the context provided through the predictive distribution. Expand
Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision
Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relativelyExpand
Rethinking Formal Models of Partially Observable Multiagent Decision Making
TLDR
This paper proves that any timeable perfect-recall EFG can be efficiently modeled as a FOG as well as relating FOGs to other existing formalisms, and presents the two building-blocks of these breakthroughs --- counterfactual regret minimization and public state decomposition in the new formalism. Expand
Agent57: Outperforming the Atari Human Benchmark
TLDR
This work proposes Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games and trains a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
TLDR
This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go. Expand
Grandmaster level in StarCraft II using multi-agent reinforcement learning
TLDR
The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II. Expand
Mastering the game of Go without human knowledge
TLDR
An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo. Expand
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
TLDR
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed. Expand
Observe and Look Further: Achieving Consistent Performance on Atari
TLDR
This paper proposes an algorithm that addresses three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently. Expand
TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning
TLDR
TreeQN is proposed, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network. Expand
Mastering the game of Go with deep neural networks and tree search
TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
TLDR
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
Model-Based Reinforcement Learning for Atari
TLDR
Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting. Expand
Learning Latent Dynamics for Planning from Pixels
TLDR
The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components. Expand
...
1
2
3
4
5
...