Learn More
We consider the problem of learning models of options for real-time abstract planning , in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the universal option model (UOM). We prove that the UOM(More)
—In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a probability kernel. We show that the new framework captures many existing abstractions. We also introduce the concept of factored linear action models; a special case. Again, the relation of factored linear(More)
We propose a new class of algorithms that directly precondition the TD update. We then focus on a new preconditioned algorithm and prove its convergence. Empirical results on the new algorithm shall be presented in a detailed version of this paper. and iLSTD via a class of Preconditioned TD (PTD) algorithms. This paper explores yet another class of(More)
We consider the problem of policy evaluation in a special class of Markov Decision Processes (MDPs) where the underlying Markov chains are large and sparse. We start from a stationary model equation that the limit of Temporal Difference (TD) learning satisfies, and develop a Robbins-Monro method consistently estimating its coefficients. Then we introduce(More)
— We consider linear prediction problems in a stochastic environment. The least mean square (LMS) algorithm is a well-known, easy to implement and computationally cheap solution to this problem. However, as it is well known, the LMS algorithm, being a stochastic gradient descent rule, may converge slowly. The recursive least squares (RLS) algorithm(More)
Dyna planning is an efficient way of learning from real and imaginary experience. Existing tabular and linear Dyna algorithms are single-step, because an " imaginary " feature is predicted only one step into the future. In this paper, we introduce a multi-step Dyna planning that predicts more steps into the future. Multi-step Dyna is able to figure out a(More)