#### Filter Results:

#### Publication Year

1994

2005

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework.
In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is provided with a “reset” operation that interrupts the… (More)

In many optimization and decision problems the objective function can be expressed as a linear combination of competing criteria, the weights of which specify the relative importance of the criteria for the user. We consider the problem of learning such a " subjective " function from preference judgments collected from traces of user interactions. We… (More)

We propose a model of eecient on-line reinforcement learning based on the expected mistake bound framework introduced by Haussler, Littlestone and Warmuth (1987). The measure of performance we use is the expected diierence between the total reward received by the learning agent and that received by an agent behaving optimally from the start. We call this… (More)

Current route advice systems present a single route to the driver based on static evaluation criteria, with little or no recourse if the driver finds this solution unsatisfactory. In this paper, we propose a more flexible approach and its implementation in the Adaptive Route Advisor. Our system behaves more like a human travel agent, using driver… (More)

In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework. In our model the learner does not have direct access to every state of the environment. Instead , every sequence of experiments starts in a xed initial state and the learner is provided with a \reset" operation that interrupts the current… (More)

General algorithms for the reinforcement learning problem typically learn policies in the form of a table that directly maps the states of the environment into actions. When the state-space is large these methods become impractical. One approach to increase eeciency is to restrict the class of policies by considering only policies that can be described… (More)

Our research group is investigating the use of adaptive user interfaces for in-car information access. These interfaces attempt to efficiently provide content the driver needs and wants, and gather feedback on these preferences through the driver's interaction with the system. In this way, the performance of the system improves as it unobtrusively builds a… (More)

We consider a special case of reinforcement learning where the environment can be described by a linear system. The states of the environment and the actions the agent can perform are represented by real vectors and the system dynamic is given by a linear equation with a stochastic component. The problem is equivalent to the so-called linear quadratic… (More)

- Pat Langley, Claude-Nicolas Fiechter, Mehmet Göker, Cynthia Thompson, Andrea Danyluk, Claude Sammut +4 others
- 2005

ICML-2005 reviewing will be blind to the identities of the authors, and therefore identifying information should not appear in papers submitted for review. As in the past few years, ICML-2005 will rely heavily on electronic formats for submission and review. We assume that nearly all authors will have access to standard software for word processing,… (More)

- ‹
- 1
- ›