#### Filter Results:

#### Publication Year

1994

2000

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework.
In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is provided with a “reset” operation that interrupts the… (More)

In many optimization and decision problems the objective function can be expressed as a linear combination of competing criteria, the weights of which specify the relative importance of the criteria for the user. We consider the problem of learning such a " subjective " function from preference judgments collected from traces of user interactions. We… (More)

We propose a model of eecient on-line reinforcement learning based on the expected mistake bound framework introduced by Haussler, Littlestone and Warmuth (1987). The measure of performance we use is the expected diierence between the total reward received by the learning agent and that received by an agent behaving optimally from the start. We call this… (More)

Current route advice systems present a single route to the driver based on static evaluation criteria, with little or no recourse if the driver finds this solution unsatisfactory. In this paper, we propose a more flexible approach and its implementation in the Adaptive Route Advisor. Our system behaves more like a human travel agent, using driver… (More)

We consider a special case of reinforcement learning where the environment can be described by a linear system. The states of the environment and the actions the agent can perform are represented by real vectors and the system dynamic is given by a linear equation with a stochastic component. The problem is equivalent to the so-called linear quadratic… (More)

- ‹
- 1
- ›