A statistical property of multiagent learning based on Markov decision process

@article{Iwata2006ASP,
  title={A statistical property of multiagent learning based on Markov decision process},
  author={Kazunori Iwata and K. Ikeda and Hideaki Sakai},
  journal={IEEE transactions on neural networks},
  year={2006},
  volume={17 4},
  pages={
          829-42
        }
}
We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind… 
An Information-Spectrum Approach to Analysis of Return Maximization in Reinforcement Learning
TLDR
This paper gives an information-spectrum analysis of return maximization in more general processes than stationary or ergodic Markov decision processes, and presents a class of stochastic sequential decision processes with the necessary condition forreturn maximization.
An Information-Theoretic Class of Stochastic Decision Processes
  • Kazunori Iwata
  • Computer Science, Mathematics
    2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
  • 2008
TLDR
Using an information-theoretic property, this paper shows a class of stochastic decision processes in reinforcement learning in which return maximization occurs with a positive probability.
The Convergence of a Cooperation Markov Decision Process System
TLDR
A Cooperation Markov Decision Process (CMDP) system with two agents is introduced, which is suitable for the learning evolution of cooperative decision between two agents, and it is found that the value function in the CMDP system also converges in the end.
An action-selection strategy insensitive to parameter-settings in reinforcement learning
TLDR
This paper improves an action-selection strategy to make it insensitive to parameter-settings by using the stochastic complexity, which gives better policies for alleviating the exploration-exploitation dilemma in most parameter- settings.
Multiagent Reinforcement Learning: Spiking and Nonspiking Agents in the Iterated Prisoner's Dilemma
TLDR
This paper investigates multiagent reinforcement learning (MARL) with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome.
Quasi-Synchronization for Periodic Neural Networks With Asynchronous Target and Constrained Information
TLDR
A new period is established based on the lowest common multiple period of the target dynamic and the followers to obtain an augmented synchronization error system (ASES), a suboptimal iterative algorithm is proposed to cut down the QS range of the ASES and the corresponding controllers are designed.
Information Geometry and Information Theory in Machine Learning
TLDR
The asymptotic equipartition property is one of the essence in information theory and the example in a Markov decision process is given and how it is related to return maximization in reinforcement learning is shown.
Q-learning based object grasping control strategy for home service robot with rotatable waist
TLDR
In this paper, a Q-learning based object grasping strategy and control method is proposed for the home service robot with a rotatable waist and the position of end-effector is calibrated using an ultrasonic ranging module.
...
1
2
...

References

SHOWING 1-10 OF 34 REFERENCES
A multiagent reinforcement learning algorithm by dynamically merging markov decision processes
TLDR
A new learning algorithm called MAPLE (MultiAgent Policy LEarning) is presented that uses Q-learning and dynamic merging to efficiently construct global solutions to the overall multiagent problem from solutions to multiple Markov decision processes.
A Generalized Reinforcement-Learning Model: Convergence and Applications
TLDR
This paper shows how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes M DPs, two-player games and MDP’s under a worst-case optimality criterion as special cases.
A multiagent reinforcement learning algorithm using extended optimal response
TLDR
A multiagent reinforcement learning algorithm that will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response, and some empirical results in three simple stochastic games show that the algorithm can realize what it intends.
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
TLDR
A multiagent Q-learning method is designed under general-sum stochastic games, and it is proved that it converges to a Nash equilibrium under speci ed conditions.
Cooperative Multi-Agent Learning: The State of the Art
TLDR
This survey attempts to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics, and finds that this broad view leads to a division of the work into two categories.
Nash Q-Learning for General-Sum Stochastic Games
TLDR
This work extends Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games, and implements an online version of Nash Q- learning that balances exploration with exploitation, yielding improved performance.
A new criterion using information gain for action selection strategy in reinforcement learning
TLDR
Using the sequence of returns as outputs from a parametric compound source, the ratio /spl omega/ of return loss to information gain is proposed as a new criterion to be used in probabilistic action-selection strategies.
Coordinating Multiple Agents via Reinforcement Learning
TLDR
It is argued that it is important to explicitly model and explore coordination-specific information, which underpins the two algorithms and attributes to the effectiveness of the algorithms.
...
1
2
3
4
...