A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

  title={A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play},
  author={David Silver and Thomas Hubert and Julian Schrittwieser and Ioannis Antonoglou and Matthew Lai and Arthur Guez and Marc Lanctot and L. Sifre and Dharshan Kumaran and Thore Graepel and Timothy P. Lillicrap and Karen Simonyan and Demis Hassabis},
  pages={1140 - 1144}
The game of chess is the longest-studied domain in the history of artificial intelligence. [] Key Result Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Impartial Games: A Challenge for Reinforcement Learning

The first concrete example of a game - namely the (children) game of nim - and other impartial games that seem to be a stumbling block for AlphaZero and similar reinforcement learning algorithms are presented, and more powerful bottlenecks are provided than previously suggested.

Mastering board games

Silver et al. (6) show that a generalization of this approach is effective across a variety of games, and their Alpha-Zero system learned to play three challenging games at the highest levels of play seen.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

Assessing Popular Chess Variants Using Deep Reinforcement Learning

It is shown that it is possible to train several chess variants with the same reinforcement learning setup and produced the very first multi-variant chess engine that utilizes Monte Carlo tree search, which has become the worlds strongest engine in the game of Horde and second strongest behind it’s sibling CrazyAra in Crazyhouse.

Learning to Play: Reinforcement Learning and Games

It is shown that the methods generalize to three games, hinting at artificial general intelligence, and an argument can be made that in doing so the authors failed the Turing test, since no human can play at this level.

A Survey of Planning and Learning in Games

This paper presents a survey of the multiple methodologies proposed to integrate planning and learning in the context of games, both in terms of their theoretical foundations and applications and also presents learning and planning techniques commonly used in games.

Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data

Improvements include modifications in the neural network design and training configuration, the introduction of a data normalization step and a more sample efficient Monte-Carlo tree search which has a lower chance to blunder.

Learning self-play agents for combinatorial optimization problems

This paper explores neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level, and proposes the Zermelo Gamification to transform specific combinatorial optimization problems into Zermello games whose winning strategies correspond to the solutions of the original optimization problems.

Learning Self-Game-Play Agents for Combinatorial Optimization Problems

The Zermelo Gamification (ZG) is proposed, following the idea of Hintikka's Game-Theoretical Semantics, to transform specific combinatorial optimization problems intoZermelo games whose winning strategies correspond to the solutions of the original optimization problem.



Mastering the game of Go without human knowledge

An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Mastering the game of Go with deep neural networks and tree search

Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.


  • B. Pell
  • Computer Science
    Comput. Intell.
  • 1996
Besides being the first Metagame‐playing program, this is the first program to have derived useful piece values directly from analysis of the rules of different games.

Learning to Play Chess Using Temporal Differences

TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search, is presented and it is investigated whether it can yield better results in the domain of backgammon, where TD( ε) has previously yielded striking success.

Computer shogi

Giraffe: Using Deep Reinforcement Learning to Play Chess

Giraffe is the most successful attempt thus far at using end-to-end machine learning to play chess, with minimal hand-crafted knowledge given by the programmer.

Bootstrapping from Game Tree Search

This paper introduces a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search, and implemented this algorithm in a chess program Meep, using a linear heuristic function.

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

The latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.