KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

@article{Zhou2020KdConvAC,
  title={KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation},
  author={Hao Zhou and Chujie Zheng and Kaili Huang and Minlie Huang and Xiaoyan Zhu},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.04100}
}
The research of knowledge-driven conversational systems is largely limited due to the lack of dialog data which consists of multi-turn conversations on multiple topics and with knowledge annotations. In this paper, we propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conversations to knowledge graphs. Our corpus contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn… 

Figures and Tables from this paper

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation
TLDR
A Chinese multi-turn topic-driven conversation dataset, NaturalConv, which allows the participants to chat anything they want as long as any element from the topic is mentioned and the topic shift is smooth, which should be a good benchmark for further research to evaluate the validity and naturalness of multi- turn conversation systems.
KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System
TLDR
A first Cantonese knowledge-driven Dialogue Dataset for REStaurant in Hong Kong, which grounds the information in multi-turn conversations to one specific restaurant and designed fine-grained slots and intents to better capture semantic information.
Leveraging Different Context for Response Generation through Topic-guided Multi-head Attention
TLDR
A context-augmented model, named TGMA-RG, which leverages the conversational context to promote interactivity and persistence of multi-turn dialogues through topic-guided multi-head attention mechanism and design a hierarchical encoder-decoder models with a multi- head attention mechanism.
Prediction, Selection, and Generation: Exploration of Knowledge-Driven Conversation System
TLDR
This paper combines the knowledge bases and pre-training model to propose a knowledgedriven conversation system that includes modules such as dialogue topic prediction, knowledge matching and dialogue generation, and makes the system reach state-of-the-art.
Keep and Select: Improving Hierarchical Context Modeling for Multi-Turn Response Generation.
TLDR
A multi-turn response generation model named KS-CQ is proposed, which contains two crucial components, the Keep and the Select modules, to produce a neighbor-aware context representation and a context-enriched query representation.
Guiding Topic Flows in the Generative Chatbot by Enhancing the ConceptNet with the Conversation Corpora
TLDR
This work proposes the method to supply more concept relations extracted from the conversational corpora and reconstruct an enhanced concept graph for the chatbot construction and presents a novel, powerful, and fast graph encoding architecture named the Edge-Transformer to replace the traditional GNN architecture.
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
TLDR
Automatic and human evaluations show that the proposed EVA2.0, a large-scale pre-trained open-domain Chinese dialogue model with 2.8 billion parameters, outperforms other open-source counterparts.
SINC: Service Information Augmented Open-Domain Conversation
TLDR
Both automatic evaluation and human evaluation show that the proposed new method can improve the effect of open-domain conversation, and the session-level overall score in human evaluation is improved by 59.29% compared with the dialogue pre-training model PLATO-2.
Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge
TLDR
This work introduces a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge and shows that the utterances of the data are constructed with the proper knowledge and persona through grounding quality assessment.
Construction of Hierarchical Structured Knowledge-based Recommendation Dialogue Dataset and Dialogue System
TLDR
A dialogue dataset, Japanese Movie Recommendation Dialogue (JMRD), in which the recommender recommends one movie in a long dialogue (23 turns on average), and a movie recommendation dialogue system that considers the structure of the external knowledge and the history of the knowledge used.
...
...

References

SHOWING 1-10 OF 51 REFERENCES
Towards Exploiting Background Knowledge for Building Conversation Systems
TLDR
This work creates a new dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie.
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations
TLDR
Topical-Chat is introduced, a knowledge-grounded humanhuman conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don’t have explicitly defined roles, to help further research in opendomain conversational AI.
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs
TLDR
A new task about how to apply dynamic knowledge graphs in neural conversation model is proposed and a novel TV series conversation corpus (DyKgChat) is presented for the task and it is shown that the proposed approach outperforms previous knowledge-grounded conversation models.
Wizard of Wikipedia: Knowledge-Powered Conversational agents
TLDR
The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.
Flexible End-to-End Dialogue System for Knowledge Grounded Conversation
TLDR
A dynamic knowledge enquirer which selects different answer entities at different positions in a single response, according to different local context is designed, enabling the model to deal with out-of-vocabulary entities.
OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
TLDR
This work proposes the DialKG Walker model, a conversational reasoning model that learns the symbolic transitions of dialog contexts as structured traversals over KG, and predicts natural entities to introduce given previous dialog contexts via a novel domain-agnostic, attention-based graph path decoder.
Generating Long and Diverse Responses with Neural Conversation Models
TLDR
This work presents new training and decoding methods that improve the quality, coherence, and diversity of long responses generated using sequence-to-sequence models, and introduces a stochastic beam-search algorithm with segment-by-segment reranking which lets us inject diversity earlier in the generation process.
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
TLDR
A new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading is presented, allowing for more focused integration of external knowledge than has been possible in prior approaches.
Proactive Human-Machine Conversation with Explicit Conversation Goal
TLDR
Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations.
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique
...
...