CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

@article{Zhu2020CrossWOZAL,
  title={CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset},
  author={Qi Zhu and Kaili Huang and Zheng Zhang and Xiaoyan Zhu and Minlie Huang},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  volume={8},
  pages={281-295}
}
  • Qi Zhu, Kaili Huang, +2 authors Minlie Huang
  • Published 27 February 2020
  • Computer Science
  • Transactions of the Association for Computational Linguistics
Abstract To advance multi-domain (cross-domain) dialogue modeling as well as alleviate the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts on both user and system sides. About 60% of the… Expand
RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling
TLDR
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations, which contains 11.2K human-to-human multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets. Expand
BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling
TLDR
BiToD2 is introduced, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling and provides state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). Expand
MultiWOZ 2.3: A multi-domain task-oriented dataset enhanced with annotation corrections and co-reference annotation
TLDR
This paper introduces MultiWOZ 2.3, in which it differentiate incorrect annotations in dialogue acts from dialogue states, and identifies a lack of co-reference when publishing the updated dataset, to ensure consistency between dialogue acts and dialogue states. Expand
MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation
TLDR
This work introduces MultiWOZ 2.41, in which all annotations in the validation set and test set on top of MultiWoz 2.1.4 are refined to encourage robust and noise-resilient model training. Expand
KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System
TLDR
A first Cantonese knowledge-driven Dialogue Dataset for REStaurant in Hong Kong, which grounds the information in multi-turn conversations to one specific restaurant and designed fine-grained slots and intents to better capture semantic information. Expand
Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking
TLDR
This work enhances the transfer learning process by intermediate fine-tuning of pretrained mult bilingual models, where the multilingual models arefine-tuned with different but related data and/or tasks. Expand
GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems
TLDR
A novel data curation method is introduced that generates GlobalWoZ – a largescale multilingual ToD datasets globalized from an English ToD dataset for three unexplored use cases, based on translating dialogue templates and filling them with local entities in the target-language countries. Expand
Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems
TLDR
This work identifies two main challenges that combined hinder the faster progress in multilingual TOD: current state-of-the-art TOD models based on large pretrained neural language models are data hungry; at the same time data acquisition for TOD use cases is expensive and tedious. Expand
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
TLDR
This paper shows that given a large-scale dialogue data set in one language, it can automatically produce an effective semantic parser for other languages using machine translation and proposes automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values and eliminate costly human supervision. Expand
Survey of Available Datasets for Designing Task Oriented Dialogue Agents
  • Manisha Thakkar, N. Pise
  • 2019 International Conference on Mechatronics, Remote Sensing, Information Systems and Industrial Information Technologies (ICMRSISIIT)
  • 2019
Dialogue Systems are increasingly popular with the recent advances in neural approaches and NLP applied to conversational AI. Alexa, Siri, Cortana, Google Mini are handily used by many users to doExpand
...
1
2
3
...

References

SHOWING 1-10 OF 50 REFERENCES
MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
TLDR
The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics is introduced, at a size of 10k dialogues, at least one order of magnitude larger than all previous annotated task-oriented corpora. Expand
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
TLDR
This work introduces the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains, and presents a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots provided as input. Expand
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
TLDR
A Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using copy mechanism, facilitating transfer when predicting (domain, slot, value) triplets not encountered during training. Expand
MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
TLDR
This work uses crowdsourced workers to re-annotate state and utterances based on the original utterances in the dataset, and benchmark a number of state-of-the-art dialogue state tracking models on the MultiWOZ 2.1 dataset and show the joint state tracking performance on the corrected state annotations. Expand
MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines
TLDR
This work uses crowdsourced workers to fix the state annotations and utterances in the original version of the MultiWOZ data, hoping that this dataset resource will allow for more effective dialogue state tracking models to be built in the future. Expand
Building a Conversational Agent Overnight with Dialogue Self-Play
TLDR
A new corpus of 3,000 dialogues spanning 2 domains collected with M2M is proposed, and comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows are presented. Expand
Key-Value Retrieval Networks for Task-Oriented Dialogue
TLDR
This work proposes a new neural dialogue agent that is able to effectively sustain grounded, multi-domain discourse through a novel key-value retrieval mechanism and significantly outperforms a competitive rule-based system and other existing neural dialogue architectures on the provided domains according to both automatic and human evaluation metrics. Expand
User Modeling for Task Oriented Dialogues
TLDR
This work designs a hierarchical sequence-to-sequence model that first encodes the initial user goal and system turns into fixed length representations using Recurrent Neural Networks (RNN), and develops several variants by utilizing a latent variable model to inject random variations into user responses to promote diversity in simulated user responses. Expand
Frames: a corpus for adding memory to goal-oriented dialogue systems
TLDR
A rule-based baseline is proposed and the frame tracking task is proposed, which consists of keeping track of different semantic frames throughout each dialogue, and the task is analysed through this baseline. Expand
A Network-based End-to-End Trainable Task-oriented Dialogue System
TLDR
This work introduces a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework that can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain. Expand
...
1
2
3
4
5
...