• Publications
  • Influence
Temporal difference learning and TD-Gammon
TLDR
The domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Expand
Practical Issues in Temporal Difference Learning
This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD(λ) algorithm, can be successfully applied to complex real-world problems. A number ofExpand
Agent-Human Interactions in the Continuous Double Auction
TLDR
It is found that agents consistently obtain significantly larger gains from trade than their human counterparts, in sharp contrast to the robust convergence observed in previous all-human or all-agent CDA experiments. Expand
TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
  • G. Tesauro
  • Computer Science
  • Neural Computation
  • 1 March 1994
TLDR
The latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players. Expand
Temporal Difference Learning and TD-Gammon
  • G. Tesauro
  • Computer Science
  • J. Int. Comput. Games Assoc.
  • 1995
TLDR
TD-GAMMON is a neural network that trains itself to be an evaluation function for the game of backgammon by playing against itself and learning from the outcome. Expand
R3: Reinforced Ranker-Reader for Open-Domain Question Answering
TLDR
This paper proposes a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question, and proposes a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning. Expand
Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference
TLDR
This work proposes a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples, and introduces a new algorithm, Meta-Experience Replay, that directly exploits this view by combining experience replay with optimization based meta-learning. Expand
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
TLDR
Two models which make use of multiple passages to generate their answers using an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model are proposed. Expand
Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation
TLDR
The multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes, is introduced, which is more adept at overcoming the sparsity of natural language and better able to capture long-term structure. Expand
Analyzing Complex Strategic Interactions in Multi-Agent Systems
We develop a model for analyzing complex games with repeated interactions, for which a full game-theoretic analysis is intractable. Our approach treats exogenously specified, heuristic strategies,Expand
...
1
2
3
4
5
...