Mastering the game of Go with deep neural networks and tree search

@article{Silver2016MasteringTG,
  title={Mastering the game of Go with deep neural networks and tree search},
  author={David Silver and Aja Huang and Chris J. Maddison and Arthur Guez and L. Sifre and George van den Driessche and Julian Schrittwieser and Ioannis Antonoglou and Vedavyas Panneershelvam and Marc Lanctot and Sander Dieleman and Dominik Grewe and John Nham and Nal Kalchbrenner and Ilya Sutskever and Timothy P. Lillicrap and Madeleine Leach and Koray Kavukcuoglu and Thore Graepel and Demis Hassabis},
  journal={Nature},
  year={2016},
  volume={529},
  pages={484-489}
}
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. [] Key Method Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this…
Playing Go without Game Tree Search Using Convolutional Neural Networks
TLDR
This work attempts to mimic human intuition in the game of Go by creating a convolutional neural policy network which, without any sort of tree search, should play the game at or above the level of most humans.
GoGoGo : Improving Deep Neural Network Based Go Playing AI with Residual Networks
TLDR
AlphaGo, a Go-playing AI built by Google DeepMind, used a new approach of combining deep neural networks with tree search to solve the Go playing problem and defeated the 18-time Go world champion Lee Sedol.
Mastering the game of Go without human knowledge
TLDR
An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
Learning Self-Game-Play Agents for Combinatorial Optimization Problems
TLDR
The Zermelo Gamification (ZG) is proposed, following the idea of Hintikka's Game-Theoretical Semantics, to transform specific combinatorial optimization problems intoZermelo games whose winning strategies correspond to the solutions of the original optimization problem.
Assessing Popular Chess Variants Using Deep Reinforcement Learning
TLDR
It is shown that it is possible to train several chess variants with the same reinforcement learning setup and produced the very first multi-variant chess engine that utilizes Monte Carlo tree search, which has become the worlds strongest engine in the game of Horde and second strongest behind it’s sibling CrazyAra in Crazyhouse.
Google AI algorithm masters ancient game of Go
TLDR
To interpret Go boards and to learn the best possible moves, the AlphaGo program applied deep learning in neural networks — braininspired programs in which connections between layers of simulated neurons are strengthened through examples and experience.
Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data
TLDR
Improvements include modifications in the neural network design and training configuration, the introduction of a data normalization step and a more sample efficient Monte-Carlo tree search which has a lower chance to blunder.
High-performance Algorithms using Deep Learning in Turn-based Strategy Games
TLDR
Methods that can accelerate the learni ng of neural networks by solving the problem of the data representation of neural Networks using a search using the Monte Carlo Tree Search (MCTS) are discussed.
Dual Monte Carlo Tree Search
TLDR
Dual MCTS is proposed, which uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the -greedy algorithm to reduce the number of updates to the tree.
Combining Off and On-Policy Training in Model-Based Reinforcement Learning
TLDR
This work proposes a way to obtain off-policy targets using data from simulated games in MuZero, and shows that these targets can speed up the training process and lead to faster convergence and higher rewards than the ones obtained by MuZero.
...
...

References

SHOWING 1-10 OF 81 REFERENCES
Mastering the game of Go without human knowledge
TLDR
An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
Move Evaluation in Go Using Deep Convolutional Neural Networks
TLDR
A large 12-layer convolutional neural network is trained by supervised learning from a database of human professional games that beats the traditional search program GnuGo in 97% of games, and matched the performance of a state-of-the-art Monte-Carlo tree search that simulates a million positions per move.
Training Deep Convolutional Neural Networks to Play Go
TLDR
The convolutional neural networks trained in this work can consistently defeat the well known Go program GNU Go and win some games against state of the art Go playing program Fuego while using a fraction of the play time.
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
TLDR
This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
Temporal Difference Learning of Position Evaluation in the Game of Go
TLDR
This work demonstrates a viable alternative by training networks to evaluate Go positions via temporal difference (TD) learning, based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play.
Temporal-difference search in computer Go
TLDR
This work applies temporal-difference search to the game of 9×9 Go, using a million binary features matching simple patterns of stones, and outperformed an unenhanced Monte-Carlo tree search with the same number of simulations.
Bayesian pattern ranking for move prediction in the game of Go
TLDR
A probability distribution over legal moves for professional play in a given position in Go is obtained and shows excellent prediction performance as indicated by its ability to perfectly predict the moves made by professional Go players in 34% of test positions.
Bootstrapping from Game Tree Search
TLDR
This paper introduces a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search, and implemented this algorithm in a chess program Meep, using a linear heuristic function.
Balancing MCTS by Dynamically Adjusting the Komi Value
  • P. Baudis
  • Computer Science
    J. Int. Comput. Games Assoc.
  • 2011
TLDR
A conjecture on the resilience of the game search tree to changes in the evaluation function throughout the search is formulated and a comparison of MCTS and the traditional tree search in the context of extreme positions is compared.
The games computers (and people) play
...
...