On Estimating Action Regret and Learning From It in Route Choice

Abstract

The notion of regret has been extensively employed to measure the performance of reinforcement learning agents. The regret of an agent measures how much worse it performs following its current policy in comparison to following the best possible policy. As such, measuring regret requires complete knowledge of the environment. However, such an assumption is not realistic in most multiagent scenarios. In this paper, we address the route choice problem, in which each driver must choose the best route between its origin and its destination. The expected outcome corresponds to an equilibrium point in the space of policies where no driver benefits from deviating from its policy, a concept known as User Equilibrium (UE). Considering the limited observability of such a scenario, we investigate how the agents can estimate their regret based exclusively on their experience. To this regard, we introduce the concept of estimated action regret, through which an agent can estimate how much worsen it performs by taking a given action rather than the best in hindsight. Additionally, we show how such estimations can be used as a reinforcement signal to improve their performance. We empirically evaluate our approach in different route choice scenarios, showing that the agents produce reasonable estimations of their regret. Furthermore, we show that using such estimations as the reinforcement signal provides good approximations to the UE.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Ramos2016OnEA, title={On Estimating Action Regret and Learning From It in Route Choice}, author={Gabriel de Oliveira Ramos and Ana L. C. Bazzan}, booktitle={ATT@IJCAI}, year={2016} }