• Corpus ID: 235417032

DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

  title={DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning},
  author={Daochen Zha and Jingru Xie and Wenye Ma and Sheng Zhang and Xiangru Lian and Xia Hu and Ji Liu},
  booktitle={International Conference on Machine Learning},
Games are abstractions of the real world, where artificial agents learn to compete and cooperate with other agents. While significant achievements have been made in various perfect- and imperfect-information games, DouDizhu (a.k.a. Fighting the Landlord), a three-player card game, is still unsolved. DouDizhu is a very challenging domain with competition, collaboration, imperfect information, large state space, and particularly a massive set of possible actions where the legal actions vary… 

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

The integration of the above two techniques into DouZero, the DouDizhu AI system achieves better performance and ranks top in the Botzone leaderboard among more than 400 AI agents, including DouZero.

Deep Reinforcement Learning for Two-Player DouDizhu

This paper implements and improves DouZero system on two-player DouDizhu, a variant of the classic DouDuzhu, where there is no cooperation between the players yet with more hidden information, and designs filter network based on supervised learning to improve the quality of training data and thus accelerate the training process.

DanZero: Mastering GuanDan Game with Reinforcement Learning

This paper proposes the first AI program DanZero for GuanDan using reinforcement learning technique, utilizing a distributed framework to train the AI system and reveals the outstanding performance of DanZero.

PerfectDou: Dominating DouDizhu with Perfect Information Distillation

This paper proposes PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation that allows the agents to utilize the global information to guide the training of the policies as if it is a perfect information game.

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

The empirical results demonstrate that the policies found by many existing methods including Neural Fictitious Self Play and Policy Space Response Oracle can be prone to exploitation by adversarial opponents, and the output policies of the proposed algorithms are robust to exploitation, and thus outperform existing methods.

Speedup Training Artificial Intelligence for Mahjong via Reward Variance Reduction

Results show that RVR significantly reduces the variance in Mahjong AI training and improves the model performance, as well as improving the training stability using an expected reward network to adapt to the complex, dynamic, and highly stochastic reward environment.

Hierarchical Architecture for Multi-Agent Reinforcement Learning in Intelligent Game

  • Bin Li
  • Computer Science
    2022 International Joint Conference on Neural Networks (IJCNN)
  • 2022
A hierarchical architecture learning paradigm that methodologically combines the multi- agent algorithm and single-agent algorithm in multi-agent environments is proposed and macro-operation is introduced to reduce the original action space, while skillfully mitigating the scalability issue.

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

A multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, and validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributedDeep reinforcement learning under complex games.

More Like Real World Game Challenge for Partially Observable Multi-Agent Cooperation

The WGC is a lightweight, flexible, and easy-to-use environment with a clear framework that can be easily configured by users and introduces more challenges that better reflect the real-world characteristics.

TiZero: Mastering Multi-Agent Football with Curriculum Learning and Self-Play

This paper develops a multi-agent system to play the full 11 vs. 11 game mode, without demonstrations, and introduces several innovations, including adaptive curriculum learning, a novel self-play strategy, and an objective that optimizes the policies of multiple agents jointly.

Suphx: Mastering Mahjong with Deep Reinforcement Learning

An AI for Mahjong is designed, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation, which is the first time that a computer program outperforms most top human players in Mahjong.

Combinational Q-Learning for Dou Di Zhu

This paper proposes a novel method to handle combinatorial actions, which it is called combinational Q-learning (CQL), and employs a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions.

DeltaDou: Expert-level Doudizhu AI through Self-play

The results show that self-play can significantly improve the performance of the agent in this multiagent imperfect information game Doudizhu and even starting with a weak AI, the agent can achieve human expert level after days of self- play and training.

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.

Mastering Atari, Go, chess and shogi by planning with a learned model

The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

A deep reinforcement learning framework to tackle the problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games is presented, which is of low coupling and high scalability, which enables efficient explorations at large scale.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

This paper introduces the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge, and combines fictitious self-play with deep reinforcement learning.

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

Towards Playing Full MOBA Games with Deep Reinforcement Learning

This paper proposes a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning, and develops a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo tree-search.

Deep Reinforcement Learning with Double Q-Learning

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.