Hamid Reza Maei

Learn More
Although the hippocampus plays a crucial role in the formation of spatial memories, as these memories mature they may become additionally (or even exclusively) dependent on extrahippocampal structures. However, the identity of these extrahippocampal structures that support remote spatial memory is currently not known. Using a Morris water-maze task, we show(More)
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the number of features. Our algorithm, Greedy-GQ, is an extension of recent work on gradient temporal-difference learning, which has hitherto been restricted to a prediction (policy(More)
A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton, Maei and others in which function approximation is much more straightforward. In this paper, we introduce the GQ(λ) algorithm which can be seen as extension of that work to a more general setting including eligibility traces and off-policy learning of(More)
We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as(More)
We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, behavior policy, and target policy, and whose complexity scales linearly in the number of parameters. We consider an i.i.d. policy-evaluation setting in which the data need not come(More)
Sutton, Szepesv&#225;ri and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their <i>gradient temporal difference</i> (GTD) algorithm converges reliably, it(More)
The water maze is commonly used to assay spatial cognition, or, more generally, learning and memory in experimental rodent models. In the water maze, mice or rats are trained to navigate to a platform located below the water's surface. Spatial learning is then typically assessed in a probe test, where the platform is removed from the pool and the mouse or(More)