Relative Value Function Approximation TITLE2

  title={Relative Value Function Approximation TITLE2},
  author={Paul E. Utgoff and Doina Precup},
A form of temporal difference learning is presented that learns the relative utility of states, instead of the absolute utility. This formulation backs up decisions instead of values, making it possible to learn a simpler function for defining a decision-making policy. A nonlinear relative value function can be learned without increasing the dimensionality of the inputs.