Actual Return Reinforcement Learning versus Temporal Differences: Some Theoretical and Experimental Results

@inproceedings{Pendrith1996ActualRR,
  title={Actual Return Reinforcement Learning versus Temporal Differences: Some Theoretical and Experimental Results},
  author={Mark D. Pendrith and Malcolm R. K. Ryan},
  booktitle={ICML},
  year={1996}
}
This paper argues that for many domains, we can expect credit-assignment methods that use actual returns to be more eeective for reinforcement learning than the more commonly used temporal diierence methods. We present analysis and empirical evidence from three sets of experiments in diierent domains to support this claim. A new algorithm we call C-Trace, a variant of the P-Trace RL algorithm is introduced, and some possible advantages of using algorithms of this type are discussed.