Temporal difference learning and TD-Gammon

  title={Temporal difference learning and TD-Gammon},
  author={Gerald Tesauro},
  journal={Commun. ACM},
  • G. Tesauro
  • Published 1 March 1995
  • Computer Science
  • Commun. ACM
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the… 

Solving Go on a 3x3 Board Using Temporal-Difierence Learning

This work attempted to solve a 3 x 3 board using another approach — TD-learning, and developed learning agents using TD(0), the simplest form of TD(λ), and TD-directed(0) each with a lookup table, and trained these against some selfdeveloped training agents.

Solving Go on a 3 x 3 Board Using Temporal-Difference Learning

This work attempted to solve a 3 x 3 board using another approach — TD-learning, and developed learning agents using TD(0), the simplest form of TD(λ), and TD-directed(0) each with a lookup table, and trained these against some selfdeveloped training agents.

Using Reinforcement Learning in Chess Engines

This chess engine proved that reinforcement learning in combination with the classification of board state leads to a notable improvement, when compared with other engines that only use reinforcement learning, such as KnightCap.


The main aim of the paper is to explore possibilities of using reinforcement learning, the commonly known TD-Gammon, for a game without random factors, by using a popular game – checkers.

TD-GAC: Machine Learning Experiment with Give-Away Checkers

Results of applying the temporal difference learning methods in the game of give-away checkers show the success of Temporal Difference methods in improving the quality of computer player’s policy.

Why co-evolution beats temporal difference learning at Backgammon for a linear architecture, but not a non-linear architecture

  • P. Darwen
  • Computer Science
    Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546)
  • 2001
This paper compares co-evolutionary learning and temporal difference learning on the game of Backgammon, which (like many real-world tasks) has an element of random uncertainty.

A Gamut of Games

The past successes, current projects, and future research directions for AI using computer games as a research test bed are reviewed.

Move Ranking and Evaluation in the Game of Arimaa

The work presented here is the first major attempt to apply the tools of machine learning to Arimaa, and makes two main contributions to the state-of-the-art in artificial intelligence for this game.

Learning Minesweeper with Multirelational Learning

This paper shows that when integrating certain techniques into a general purpose learning system (Mio), the resulting system is capable of inducing a Minesweeper playing strategy that beats the winning rate of average human players.

Move Prediction in the Game of Go

A novel move prediction system is created, based on a naive Bayes model, which builds upon the work of several previous move prediction systems and achieves competitive results in terms of move prediction accuracy when tested on professional games and high-ranking amateur games.



Temporal Difference Learning of Position Evaluation in the Game of Go

This work demonstrates a viable alternative by training networks to evaluate Go positions via temporal difference (TD) learning, based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play.

Some Studies in Machine Learning Using the Game of Checkers

  • A. Samuel
  • Computer Science
    IBM J. Res. Dev.
  • 1959
A new signature-table technique is described together with an improved book-learning procedure which is thought to be much superior to the linear polynomial method and to permit the program to look ahead to a much greater depth than it otherwise could do.

Programming a computer for playing chess

This paper is concerned with the problem of constructing a computing routine or “program” for a modern general purpose computer which will enable it to play chess. Although perhaps of no practical

Neurogammon Wins Computer Olympiad

Neurogammon 1.0 is a backgammon program which uses multilayer neural networks to make move decisions and doubling decisions and won the First Computer Olympiad in London with a perfect record of five wins and no losses.

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

The fundamental principles, basic mechanisms, and formal analyses involved in the development of parallel distributed processing (PDP) systems are presented in individual chapters contributed by

On Optimal Doubling in Backgammon

The concept of an effective doubling number for noncontinuous games is introduced. Computer simulation is employed to determine an extremely accurate strategy for accepting, doubling, and redoubling

The Cascade-Correlation Learning Architecture

The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.