#### Filter Results:

- Full text PDF available (11)

#### Publication Year

2006

2014

- This year (0)
- Last 5 years (6)
- Last 10 years (12)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the universal option model (UOM). We prove that the UOM of… (More)

- Hengshuai Yao, Csaba Szepesvári
- AAAI
- 2012

In this paper we consider the problem of finding a good policy given some batch data. We propose a new approach, LAMAPI, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy. A natural choice for the policy evaluation step in… (More)

In this paper we introduce the concept of pseudoMDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a probability kernel. We show that the new framework captures many existing abstractions. We also introduce the concept of factored linear action models; a special case. Again, the relation of factored linear… (More)

We consider linear prediction problems in a stochastic environment. The least mean square (LMS) algorithm is a well-known, easy to implement and computationally cheap solution to this problem. However, as it is well known, the LMS algorithm, being a stochastic gradient descent rule, may converge slowly. The recursive least squares (RLS) algorithm overcomes… (More)

We introduce a new framework for web page ranking—reinforcement ranking—that improves the stability and accuracy of Page Rank while eliminating the need for computing the stationary distribution of random walks. Instead of relying on teleportation to ensure a well defined Markov chain, we develop a reverse-time reinforcement learning framework that… (More)

- Hengshuai Yao
- ArXiv
- 2012

On the Web, visits of a page are often introduced by one or more valuable linking sources. Indeed, good back links are valuable resources for Web pages and sites. We propose to discovering and leveraging the best backlinks of pages for ranking. Similar to PageRank, MaxRank scores are updated recursively. In particular, with probability λ, the MaxRank of a… (More)

- Hengshuai Yao, Zhi-Qiang Liu
- ICML
- 2008

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that solves a stochastic model equation. This paper… (More)

In this paper we introduce a multi-step linear Dyna-style planning algorithm. The key element of the multi-step linear Dyna is a multi-step linear model that enables multi-step projection of a sampled feature and multi-step planning based on the simulated multi-step transition experience. We propose two multi-step linear models. The first iterates the… (More)

- Hengshuai Yao, Diao Dongcui, Zengqi Sun
- First International Multi-Symposiums on Computer…
- 2006

In this paper, we develop a multi-step prediction algorithm that is guaranteed to converge when using general function approximation. Besides, the new algorithm should satisfy the following requirements: first, it does not have to be faster than TD(0) in the look-up table representation; however, the new algorithm should be faster than residual gradient… (More)