Corpus ID: 237572122

PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation

@article{Bao2021PLATOXLET,
  title={PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation},
  author={Siqi Bao and Huang He and Fan Wang and Hua Wu and Haifeng Wang and Wenquan Wu and Zhihua Wu and Zhen Guo and Hua Lu and Xinxian Huang and Xin Tian and Xinchao Xu and Yingzhan Lin and Zhengyu Niu},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.09519}
}
To explore the limit of dialogue generation pre-training, we present the models of PLATO-XL with up to 11 billion parameters, trained on both Chinese and English social media conversations. To train such large models, we adopt the architecture of unified transformer with high computation and parameter efficiency. In addition, we carry out multi-party aware pre-training to better distinguish the characteristic information in social media conversations. With such designs, PLATO-XL successfully… Expand

Figures and Tables from this paper

Finetuning Large-Scale Pre-trained Language Models for Conversational Recommendation with Knowledge Graph
  • Lingzhi Wang, Huang Hu, Lei Sha, Can Xu, Kam-Fai Wong, Daxin Jiang
  • Computer Science
  • ArXiv
  • 2021
TLDR
To unify two modules of dialogue generation and item recommendation into a PLMs-based framework, RID expands the generation vocabulary of PLMs to include an extra item vocabulary, and introduces a vocabulary pointer to control when to recommend target items in the generation process. Expand
Partner Personas Generation for Diverse Dialogue Generation
  • Hongyuan Lu, W. Lam, Hong Cheng, Helen M. Meng
  • Computer Science
  • 2021
TLDR
A novel framework that leverages automatic partner personas generation to enhance the succeeding dialogue generation and incorporates reinforcement learning with a dedicatedly designed critic network for reward judgement is offered. Expand

References

SHOWING 1-10 OF 45 REFERENCES
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
TLDR
This work proposes a novel dialogue generation pre-training framework to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering, and introduces discrete latent variables to tackle the inherent one-to-many mapping problem in response generation. Expand
EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training
  • Hao Zhou, Pei Ke, +11 authors Jie Tang
  • Computer Science
  • ArXiv
  • 2021
TLDR
EVA, a Chinese dialogue system that contains the largest Chinese pre-trained dialogue model with 2.8B parameters is proposed, and extensive experiments on automatic and human evaluation show that EVA outperforms other ChinesePre- trained dialogue models especially in the multi-turn interaction of humanbot conversations. Expand
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
TLDR
It is shown that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. Expand
A Large-Scale Chinese Short-Text Conversation Dataset
TLDR
A large-scale cleaned Chinese conversation dataset, LCCC, which contains a base version (6.8million dialogues) and a large version (12.0 million dialogues), and pre-training dialogue models which are trained on LCCC-base and LCCC -large respectively. Expand
Proactive Human-Machine Conversation with Explicit Conversation Goal
TLDR
Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. Expand
Know More about Each Other: Evolving Dialogue Strategy via Compound Assessment
TLDR
A novel Generation-Evaluation framework is developed for multi-turn conversations with the objective of letting both participants know more about each other, demonstrating that the proposed method outperforms the other state-of-the-art approaches significantly. Expand
Deep Reinforcement Learning for Dialogue Generation
TLDR
This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering. Expand
CPM: A Large-scale Generative Chinese Pre-trained Language Model
TLDR
CPM, with 2.6 billion parameters and 100GB Chinese training data, is the largest Chinese pre-trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Expand
MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
TLDR
This work uses crowdsourced workers to re-annotate state and utterances based on the original utterances in the dataset, and benchmark a number of state-of-the-art dialogue state tracking models on the MultiWOZ 2.1 dataset and show the joint state tracking performance on the corrected state annotations. Expand
Recipes for Building an Open-Domain Chatbot
TLDR
Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models. Expand
...
1
2
3
4
5
...