PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation

  title={PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation},
  author={Jing Gu and Qing-yang Wu and Chongruo Wu and Weiyan Shi and Zhou Yu},
Large pre-trained language generation models such as GPT-2 have demonstrated their effectiveness as language priors by reaching state-of-the-art results in various language generation tasks. However, the performance of pre-trained models on task-oriented dialog tasks is still under-explored. We propose a Pre-trainedRole Alternating Language model (PRAL), explicitly designed for task-oriented conversational systems. We design several techniques: start position randomization, knowledge… 

Figures and Tables from this paper

When does Further Pre-training MLM Help? An Empirical Study on Task-Oriented Dialog Pre-training

DAPT is beneficial in the low-resource setting, but as the fine-tuning data size grows, DAPT becomes less beneficial or even useless, and scaling the size of D APT data does not help.

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-supervised Learning and Explicit Policy Injection

GALAXY is a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning and has a stronger few-shot ability than existing models under various low-resource settings.

Pretrained Language Models for Text Generation: A Survey

This paper presents an overview of the major advances achieved in the topic of pretrained language models for text generation and discusses how to adapt existing PLMs to model different input data and satisfy special properties in the generated text.

DialoKG: Knowledge-Structure Aware Task-Oriented Dialogue Generation

This paper proposes DialoKG, a novel task-oriented dialogue system that effectively incorporates knowledge into a language model and introduces a structure-aware knowledge embedding technique and a knowledge graph-weighted attention masking strategy to facilitate the system selecting relevant information during the dialogue generation.

ChainCQG: Flow-Aware Conversational Question Generation

This work designs ChainCQG as a two-stage architecture that learns question-answer representations across multiple dialogue turns using a flow propagation training strategy and significantly outperforms both answer-aware and answer-unaware SOTA baselines.

A Survey of Pretrained Language Models Based Text Generation

This survey presents the recent advances achieved in the topic of PLMs for text generation and introduces three key points of applying PLMs to text generation: how to encode the input data as representations preserving input semantics which can be fused into PLMs.

Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation

This work proposes a framework to model the mutual dependency between text-visual features, where the model not only needs to learn the probability of generating the next dialog utterance given preceding dialog utterances and visual contexts, but also the likelihood of predicting the visual features in which aDialog utterance takes place.

Recent Advances in Neural Text Generation: A Task-Agnostic Survey

A task-agnostic survey of recent advances in neural text generation is presented, which group under the following four headings: data construction, neural frameworks, training and inference strategies, and evaluation metrics.

Effectiveness of Pre-training for Few-shot Intent Classification

The high effectiveness of IntentBERT confirms the feasibility and practicality of few-shot intent detection, and its high generalization ability across different domains suggests that intent classi-cation tasks may share a similar underlying structure, which can be learned from a small set of labeled data.

You Don’t Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers’ Private Personas

It is shown that speakers’ personas can be inferred through a simple neural network with high accuracy, and effective defense objectives are proposed to protect persona leakage from hidden states.



Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models

Alternating Recurrent Dialog Model (ARDM) is a simple, general, and effective framework that outperforms or is on par with state-of-the-art methods on two popular task-oriented dialog datasets: CamRest676 and MultiWOZ and can generalize to more challenging, non-collaborative tasks such as persuasion.

SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model

A new method SOLOIST is presented, which uses transfer learning to efficiently build task-oriented dialog systems at scale using a Transformer-based auto-regressive language model, which subsumes different dialog modules into a single neural model.

Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems

This paper proposes a task-oriented dialogue model that operates solely on text input: it effectively bypasses explicit policy and language generation modules and holds promise to mitigate the data scarcity problem, and to support the construction of more engaging and more eloquent task- oriented conversational agents.

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

It is shown that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

A new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo is introduced which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model which shows strong improvements over the current state-of-the-art end-to-end conversational models.

A Network-based End-to-End Trainable Task-oriented Dialogue System

This work introduces a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework that can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.

Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset

This work introduces the initial release of the Taskmaster-1 dataset which includes 13,215 task-based dialogs comprising six domains and offers several baseline models including state of the art neural seq2seq architectures with benchmark performance as well as qualitative human evaluations.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset

This work introduces the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains, and presents a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots provided as input.

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics is introduced, at a size of 10k dialogues, at least one order of magnitude larger than all previous annotated task-oriented corpora.