Feudal Q-learning


One popular way of exorcising the ddmon of dimensionality in dynamic programming is to consider spatial and temporal hierarchies for representing the value functions and policies. This paper develops a hierarchical method for Q-learning which is based on the familiar notion of a recursive feudal serfdom, with managers setting tasks and giving rewards and punishments to their juniors and in their turn receiving tasks and rewards and punishments from their superiors. We show how one such system performs in a navigation task, based on a manual division of state-space at successively coarser resolutions. Links with other hierarchical systems are discussed.

1 Figure or Table

Cite this paper

@inproceedings{Dayan1995FeudalQ, title={Feudal Q-learning}, author={Peter Dayan}, year={1995} }