Mastering the game of Go without human knowledge

@article{Silver2017MasteringTG,
  title={Mastering the game of Go without human knowledge},
  author={David Silver and Julian Schrittwieser and Karen Simonyan and Ioannis Antonoglou and Aja Huang and Arthur Guez and Thomas Hubert and Lucas baker and Matthew Lai and Adrian Bolton and Yutian Chen and Timothy P. Lillicrap and Fan Hui and L. Sifre and George van den Driessche and Thore Graepel and Demis Hassabis},
  journal={Nature},
  year={2017},
  volume={550},
  pages={354-359}
}
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. [] Key Method AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved…

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

TLDR
This paper generalises the approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains, and convincingly defeated a world-champion program in each case.

Artificial intelligence: Learning to play Go from scratch

TLDR
An artificial-intelligence program called AlphaGo Zero has mastered the game of Go without any human data or guidance, and the work suggests that the same fundamental principles of the game have some universal character, beyond human bias.

Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge

TLDR
The ultimate goal of this paper is to provide exploratory insights and mature auxiliary tools to enable AI researchers and computer-game communities to study, test, and improve these promising state-of-the-art methods at a much lower cost of computing resources.

Mastering the game of Go with deep neural networks and tree search

TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.

AlphaDDA: game artificial intelligence with dynamic difficulty adjustment using AlphaZero

TLDR
This study shows thatAlphaDDA can balance its skill with that of the other AI agents, except for a random player, and believes that the AlphaDDA approach can be used for any game in which the DNN can estimate the value from the state.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

TLDR
This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

A Game-centric Approach to Teaching Artificial Intelligence

TLDR
A game-centric approach to teaching artificial intelligence that follows the historical development of algorithms by popping the hood of these champion bots is reflected, and a server infrastructure for playing card games in perfect information and imperfect information playing mode is made available.

Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data

TLDR
Improvements include modifications in the neural network design and training configuration, the introduction of a data normalization step and a more sample efficient Monte-Carlo tree search which has a lower chance to blunder.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

TLDR
The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

Warm-Start AlphaZero Self-Play Search Enhancements

TLDR
This work proposes a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA).
...

References

SHOWING 1-10 OF 67 REFERENCES

Mastering the game of Go with deep neural networks and tree search

TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.

Temporal Difference Learning of Position Evaluation in the Game of Go

TLDR
This work demonstrates a viable alternative by training networks to evaluate Go positions via temporal difference (TD) learning, based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play.

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TLDR
The latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.

Bootstrapping from Game Tree Search

TLDR
This paper introduces a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search, and implemented this algorithm in a chess program Meep, using a linear heuristic function.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

TLDR
This paper introduces the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge, and combines fictitious self-play with deep reinforcement learning.

DeepStack: Expert-level artificial intelligence in heads-up no-limit poker

TLDR
DeepStack is introduced, an algorithm for imperfect-information settings that combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning.

Training Deep Convolutional Neural Networks to Play Go

TLDR
The convolutional neural networks trained in this work can consistently defeat the well known Go program GNU Go and win some games against state of the art Go playing program Fuego while using a fraction of the play time.

Human-level control through deep reinforcement learning

TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

The Integration of A Priori Knowledge into a Go Playing Neural Network

TLDR
Methods for integrating expert Go knowledge into a learning artiicial neural network are implemented in the program NeuroGo, which is able to achieve a playing strength which is equal to a conventional program playing at a medium level.

Move Evaluation in Go Using Deep Convolutional Neural Networks

TLDR
A large 12-layer convolutional neural network is trained by supervised learning from a database of human professional games that beats the traditional search program GnuGo in 97% of games, and matched the performance of a state-of-the-art Monte-Carlo tree search that simulates a million positions per move.
...