Reinforcement learning has gained wide popularity as a technique for simulation-driven approximate dynamic programming. A less known aspect is that the very reasons that make it effective in dynamic programming can also be leveraged for using it for distributed schemes for certain matrix computations involving non-negative matrices. In this spirit, we… (More)
We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems.