A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

  title={A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game},
  author={Shinya Ishii and Hajime Fujita and Masaoki Mitsutake and Tatsuya Yamazaki and Jun Matsuda and Yoichiro Matsuno},
  journal={Machine Learning},
We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning problem. The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for a single-agent system. Hearts is an example of imperfect information games, which are more difficult to deal with than perfect information games. A POMDP is a decision problem that includes a process for estimating unobservable state variables… 
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation
This work presents a model-based RL scheme for large-scale multiagent problems with partial observability and applies it to a card game, hearts, and uses a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples.
Safe Policies for Factored Partially Observable Stochastic Games
This work defines a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety, and guarantees any reward-maximal policy for the POMDP is guaranteed to be safe and reward- maximal for thePOSG.
Feature Extraction for Decision-Theoretic Planning in Partially Observable Environments
A feature extraction technique for decision-theoretic planning problems in partially observable stochastic domains and shows a novel approach for solving them, which can find an appropriate feature for acquiring a good policy, and can achieve faster learning with fewer policy parameters than a conventional algorithm.
Model-based reinforcement learning for a multi-player card game with partial observability
  • H. Fujita, S. Ishii
  • Computer Science
    IEEE/WIC/ACM International Conference on Intelligent Agent Technology
  • 2005
Simulation results show that the model-based RL method can produce an agent comparable to a human expert for this realistic problem.
Bayesian learning for multi-agent coordination
A principled Bayesian model is extended into more challenging domains, using Bayesian networks to visualise specific cases of the model and thus as an aid in deriving the update equations for the system, and an approximate scalable algorithm is developed.
Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1 - Fundamentals and Applications in Games, Robotics and Natural Language Processing
An overview of Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing is introduced.
A projective simulation scheme for partially observable multi-agent systems
It is shown that both game-theoretical notions of cooperation and competition are assignable to the PS multi-agent setting via, e.g., classes of coalitions, and Nash equilibriums, respectively.
A comprehensive survey of multi-agent reinforcement learning
The benefits and challenges of MARL are described along with some of the problem domains where MARL techniques have been applied, and an outlook for the field is provided.
Technical report 10003 Multi-agent reinforcement learning : An overview ∗
This chapter reviews a representative selection of Multi-Agent Reinforcement Learning (MARL) algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks.
An Exploration of Multi-agent Learning Within the Game of Sheephead
This paper examines a machine learning technique presented by Ishii et al. used to allow for learning in a multi-agent environment and applies an adaptation of this learning technique to the card game Sheephead and restores the Markov property needed to model the problem as a Markov decison problem.


A multi-agent reinforcement learning method for a partially-observable competitive game
A reinforcement learning (RL) method based on an actor-critic architecture, which can be applied to partially-observable multi-agent competitive games, is proposed, which deals with a card game “Hearts”.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
Reinforcement Learning of Non-Markov Decision Processes
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
This paper studies three connectionist approaches which learn to use history to handle perceptual aliasing: the window-Q, recurrent- Q, and recurrent-model architectures.
Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.
Large-scale dynamic optimization using teams of reinforcement learning agents
This dissertation uses a team of RL agents, each of which is responsible for controlling one elevator car, to demonstrate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility.
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
A multiagent Q-learning method is designed under general-sum stochastic games, and it is proved that it converges to a Nash equilibrium under speci ed conditions.
Elevator Group Control Using Multiple Reinforcement Learning Agents
The power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility is demonstrated, with results that in simulation surpass the best of the heuristic elevator control algorithms of which the author is aware.
Multi-agent reinforcement learning: an approach based on the other agent's internal model
A two-agent cooperation problem is considered, a multi-agent reinforcement learning method based on estimation of the other agent's actions is proposed, and good cooperative behaviors are achieved by the learning method.