Learn More
We propose a reinforcement learning scheme called CHQ that could efficiently acquire appropriate policies under partially observable Markov decision processes (POMDP) involving probabilistic state transitions, that frequently occurs in multiagent systems in which each agent independently takes a probabilistic action based on a partial observation of the(More)
  • 1