A statistical property of multiagent learning based on Markov decision process
@article{Iwata2006ASP,
title={A statistical property of multiagent learning based on Markov decision process},
author={Kazunori Iwata and K. Ikeda and Hideaki Sakai},
journal={IEEE transactions on neural networks},
year={2006},
volume={17 4},
pages={
829-42
}
}We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind…Â
11 Citations
An information-theoretic analysis of return maximization in reinforcement learning
- Computer Science, MathematicsNeural Networks
- 2011
An Information-Spectrum Approach to Analysis of Return Maximization in Reinforcement Learning
- Computer Science, MathematicsICONIP
- 2010
This paper gives an information-spectrum analysis of return maximization in more general processes than stationary or ergodic Markov decision processes, and presents a class of stochastic sequential decision processes with the necessary condition forreturn maximization.
An Information-Theoretic Class of Stochastic Decision Processes
- Computer Science, Mathematics2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
- 2008
Using an information-theoretic property, this paper shows a class of stochastic decision processes in reinforcement learning in which return maximization occurs with a positive probability.
The Convergence of a Cooperation Markov Decision Process System
- Computer ScienceEntropy
- 2020
A Cooperation Markov Decision Process (CMDP) system with two agents is introduced, which is suitable for the learning evolution of cooperative decision between two agents, and it is found that the value function in the CMDP system also converges in the end.
An action-selection strategy insensitive to parameter-settings in reinforcement learning
- Computer Science2009 ICCAS-SICE
- 2009
This paper improves an action-selection strategy to make it insensitive to parameter-settings by using the stochastic complexity, which gives better policies for alleviating the exploration-exploitation dilemma in most parameter- settings.
Multiagent Reinforcement Learning: Spiking and Nonspiking Agents in the Iterated Prisoner's Dilemma
- Computer ScienceIEEE Transactions on Neural Networks
- 2011
This paper investigates multiagent reinforcement learning (MARL) with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome.
Quasi-Synchronization for Periodic Neural Networks With Asynchronous Target and Constrained Information
- Computer ScienceIEEE Transactions on Systems, Man, and Cybernetics: Systems
- 2021
A new period is established based on the lowest common multiple period of the target dynamic and the followers to obtain an augmented synchronization error system (ASES), a suboptimal iterative algorithm is proposed to cut down the QS range of the ASES and the corresponding controllers are designed.
Information Geometry and Information Theory in Machine Learning
- Computer Science, MathematicsICONIP
- 2007
The asymptotic equipartition property is one of the essence in information theory and the example in a Markov decision process is given and how it is related to return maximization in reinforcement learning is shown.
Q-learning based object grasping control strategy for home service robot with rotatable waist
- Engineering2014 International Conference on Machine Learning and Cybernetics
- 2014
In this paper, a Q-learning based object grasping strategy and control method is proposed for the home service robot with a rotatable waist and the position of end-effector is calibrated using an ultrasonic ranging module.
A New Machine Learning Framework for Air Combat Intelligent Virtual Opponent
- Computer ScienceJournal of Physics: Conference Series
- 2018
References
SHOWING 1-10 OF 34 REFERENCES
A multiagent reinforcement learning algorithm by dynamically merging markov decision processes
- Computer ScienceAAMAS '02
- 2002
A new learning algorithm called MAPLE (MultiAgent Policy LEarning) is presented that uses Q-learning and dynamic merging to efficiently construct global solutions to the overall multiagent problem from solutions to multiple Markov decision processes.
A Generalized Reinforcement-Learning Model: Convergence and Applications
- Computer ScienceICML
- 1996
This paper shows how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes M DPs, two-player games and MDP’s under a worst-case optimality criterion as special cases.
The asymptotic equipartition property in reinforcement learning and its relation to return maximization
- MathematicsNeural Networks
- 2006
A multiagent reinforcement learning algorithm using extended optimal response
- Computer Science, EconomicsAAMAS '02
- 2002
A multiagent reinforcement learning algorithm that will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response, and some empirical results in three simple stochastic games show that the algorithm can realize what it intends.
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
- Computer ScienceICML
- 1998
A multiagent Q-learning method is designed under general-sum stochastic games, and it is proved that it converges to a Nash equilibrium under speci ed conditions.
Cooperative Multi-Agent Learning: The State of the Art
- Computer ScienceAutonomous Agents and Multi-Agent Systems
- 2005
This survey attempts to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics, and finds that this broad view leads to a division of the work into two categories.
Nash Q-Learning for General-Sum Stochastic Games
- Computer Science, EconomicsJ. Mach. Learn. Res.
- 2003
This work extends Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games, and implements an online version of Nash Q- learning that balances exploration with exploitation, yielding improved performance.
Multi-agent reinforcement learning: weighting and partitioning
- Computer ScienceNeural Networks
- 1999
A new criterion using information gain for action selection strategy in reinforcement learning
- Computer ScienceIEEE Transactions on Neural Networks
- 2004
Using the sequence of returns as outputs from a parametric compound source, the ratio /spl omega/ of return loss to information gain is proposed as a new criterion to be used in probabilistic action-selection strategies.
Coordinating Multiple Agents via Reinforcement Learning
- Computer ScienceAutonomous Agents and Multi-Agent Systems
- 2004
It is argued that it is important to explicitly model and explore coordination-specific information, which underpins the two algorithms and attributes to the effectiveness of the algorithms.



