Corpus ID: 221136022

Joint Policy Search for Multi-agent Collaboration with Imperfect Information

  title={Joint Policy Search for Multi-agent Collaboration with Imperfect Information},
  author={Yuandong Tian and Qucheng Gong and Tina Jiang},
To learn good joint policies for multi-agent collaboration with imperfect information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches (optimizing one agent's policy at a time, e.g., self-play) work with guarantees, in multi-agent cooperative setting they often converge to sub-optimal Nash equilibrium. On the other hand, directly modeling joint policy changes in imperfect information game is nontrivial due to complicated interplay of policies (e… Expand

Figures and Tables from this paper

Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning
A general meta-learning-based Mixing Network with Meta Policy Gradient (MNMPG) framework to distill the global hierarchy for delicate reward decomposition and is able to outperform the current state-of-the-art MARL algorithms on 4 of 5 super hard scenarios. Expand
Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings
Learned Belief Search is presented, a computationally efficient search procedure for partially observable environments that uses an approximate auto-regressive counterfactual belief that is learned as a supervised task. Expand
Human-Agent Cooperation in Bridge Bidding
We introduce a human-compatible reinforcement-learning approach to a cooperative game, making use of a third-party hand-coded human-compatible bot to generate initial training data and to performExpand


Improving Policies via Search in Cooperative Partially Observable Games
This paper proposes two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game and proves that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-Upon policy (up to a bounded approximation error). Expand
Learning Multi-agent Implicit Communication Through Actions: A Case Study in Contract Bridge, a Collaborative Imperfect-Information Game
This work completes the learning process and introduces the novel algorithm, Policy-Belief-Iteration (“P-BIT”), which mimics both components mentioned above, and uses a novel auxiliary reward to encourage information exchange by actions. Expand
Extending Q-Learning to General Adaptive Multi-Agent Systems
This paper proposes a fundamentally different approach to Q-Learning, dubbed Hyper-Q, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Expand
Finding Friend and Foe in Multi-Agent Games
The DeepRole algorithm is developed, a multi-agent reinforcement learning agent that is tested on The Resistance: Avalon, the most popular hidden role game and finds that DeepRole outperforms human players as both a cooperator and a competitor. Expand
Simple is Better: Training an End-to-end Contract Bridge Bidding Agent without Human Knowledge
This work trains a strong agent to bid competitive bridge purely through selfplay, outperforming WBridge5, a championship-winning software and is believed to be the first competitive bridge agent that is trained with no domain knowledge. Expand
Time and Space: Why Imperfect Information Games are Hard
The thesis introduces an analysis of counterfactual regret minimisation (CFR), an algorithm for solving extensive-form games, and presents tighter regret bounds that describe the rate of progress, as well as presenting a series of theoretical tools for using decomposition, and creating algorithms which operate on small portions of a game at a time. Expand
Solving Imperfect Information Games Using Decomposition
This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution, and presents an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. Expand
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
AWESOME is presented, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players and it is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). Expand
Safe and Nested Subgame Solving for Imperfect-Information Games
This work introduces subgame-solving techniques that outperform prior methods both in theory and practice and shows that subgame solving can be repeated as the game progresses down the game tree, leading to far lower exploitability. Expand
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
The Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment are met, is presented. Expand