Learning to Play: Reinforcement Learning and Games

  title={Learning to Play: Reinforcement Learning and Games},
  author={Aske Plaat},
  journal={Learning to Play},
  • A. Plaat
  • Published 2020
  • Computer Science
  • Learning to Play
principled frameworks such as minimax, reinforcement learning, or function approximation. In addition to the elegant conceptual frameworks, deep, dirty, domain-specific understanding is necessary for progress in this field [594]. 8.1.4 Towards General Intelligence Let us revisit the problem statement from the Introduction. What are the machine learning methods that are used in Chess and Go to achieve a level of play stronger than the strongest humans? Various reinforcement learning methods have… 

Warm-Start AlphaZero Self-Play Search Enhancements

This work proposes a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA).

Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning

It is concluded that AlphaZero-like deep reinforcement learning benefits from adaptive rollout based warm-start, as Rapid Action Value Estimate did for rollout-based reinforcement learning 15 years ago.

High-Accuracy Model-Based Reinforcement Learning, a Survey

This paper surveys model-based reinforcement learning methods, explaining in detail how they work and what their strengths and weaknesses are, and concludes with a research agenda for future work to make the methods more robust and more widely applicable to other applications.

Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey

A taxonomy based on three approaches: using explicit planning on given transitions, using explicit plans on learned transitions, and end-to-end learning of both planning and transitions is proposed.

Potential-based Reward Shaping in Sokoban

This work demonstrates the possibility of solving multiple instances with the help of reward shaping, and indicates that distance functions could be a suitable function for Sokoban, a well-known planning task.

Model-Based Deep Reinforcement Learning for High-Dimensional Problems, a Survey

A taxonomy based on three approaches: using explicit planning on given transitions, using explicit plans on learned transitions, and end-to-end learning of both planning and transitions is proposed.

Tackling Morpion Solitaire with AlphaZero-like Ranked Reward Reinforcement Learning

The recent impressive performance of deep self-learning reinforcement learning approaches from AlphaGo/AlphaZero is taken as inspiration to design a searcher for Morpion Solitaire, which is very close to the human best without any other adaptation to the problem than using ranked reward.

A New Challenge: Approaching Tetris Link with AI

This paper introduces a board game, Tetris Link, that is yet unexplored and appears to be highly challenging, and explores heuristic planning and two other approaches: Reinforcement Learning and Monte Carlo tree search.

Transfer Learning and Curriculum Learning in Sokoban

It is found that reusing feature representations learned previously can accelerate learning new, more complex, instances, and in effect, it is shown how curriculum learning, from simple to complex tasks, works in Sokoban.

Analysis of Hyper-Parameters for Alphazero-Like Deep Reinforcement Learning

This paper investigates 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluates how these parameters contribute to training, finding that the number of self- play iterations subsumes MCTS-search sim- ulations, game episodes, and training epochs.



Learning Self-Game-Play Agents for Combinatorial Optimization Problems

The Zermelo Gamification (ZG) is proposed, following the idea of Hintikka's Game-Theoretical Semantics, to transform specific combinatorial optimization problems intoZermelo games whose winning strategies correspond to the solutions of the original optimization problem.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

It is demonstrated that even a very basic canonical ES algorithm can achieve the same or even better performance than traditional RL algorithms, and is likely to lead to new advances in the state-of-the-art for solving RL problems.

Assessing the Potential of Classical Q-learning in General Game Playing

GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games, and inspired by (Gelly $\&$ Silver, ICML 2007) they combine online search to enhance offline learning, and propose QM-learning for GGP.

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

This paper generalises the approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains, and convincingly defeated a world-champion program in each case.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.

Thinking Fast and Slow with Deep Learning and Tree Search

This paper presents Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks, and shows that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and the final tree search agent, trained tabula rasa, defeats MoHex 1.0.

Reinforcement Learning of Local Shape in the Game of Go

The results show that small, translation-invariant templates are surprisingly effective in Go, and the linear evaluation function appears to outperform all other static evaluation functions that do not incorporate substantial domain knowledge.

Mastering the game of Go with deep neural networks and tree search

Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.