Learn More
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (ÈÇÅÅÈ). These algorithms are based on ÈÇÅÅÈ, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in ÈÇÅÅÈs. The algorithm's chief advantages(More)
In this paper we present TDLEAF(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program " KnightCap " used TDLEAF() to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650(More)
In this paper we discuss the problem of automatically learning evaluation function parameters in a chess program. In particular, we describe some experiments in which our chess program KnightCap learnt the parameters of its evaluation function using a combination of Temporal Difference learning and on-line play on FICS and ICC. KnightCap is freely available(More)
In this paper we present TDLeaf(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program " KnightCap " used TDLeaf(λ) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). The main success we report is(More)
In this paper we present TDLeaf(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program " KnightCap " used TDLeaf(λ) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). The main success we report is(More)
In [2] we introduced ÈÇÅÅÈ, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (ÈÇÅÅÈs). The algorithm's chief advantages are that it requires only a single sample path of the underlying Markov chain, it uses only one free parameter ¬ ¾ ¼ ½µ which has a(More)
In this paper we present TDLeaf(£), a variation on the TD(£) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(£) and another less radical variant, TD-directed(£). In particular, our chess program, " KnightCap, " used(More)