• Corpus ID: 235358381

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

  title={BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling},
  author={Zhaojiang Lin and Andrea Madotto and Genta Indra Winata and Peng Xu and Feijun Jiang and Yuxiang Hu and Chen Shi and Pascale Fung},
Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD2, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances… 

Figures and Tables from this paper

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

This work proposes a novel outline-based annotation process for multilingual TOD datasets, where domainspecific abstract schemata of dialogue are mapped into natural language outlines, and enables natural language understanding, dialogue state tracking, and end-toend dialogue modelling and evaluation in 4 diverse languages.

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

A new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks, and shows that the best performance entails the combination of Conversational specialization in the target language and few-shot transfer for the concrete TOD task.

Natural Language Processing for Multilingual Task-Oriented Dialogue

This tutorial will provide a systematic overview of current research gaps, challenges and initiatives related to multilingual ToD systems, with a particular focus on their connections to current research and challenges in multilingual and low-resource NLP.

ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For Low-resource Language

ViWOZ is the first multi-turn, multi-domain tasked oriented dataset in Vietnamese, a low-resource language, and provides a comprehensive benchmark of both modular and end-to-end models in lowresource language scenarios.

Cross-Lingual Transfer Learning for Arabic Task-Oriented Dialogue Systems Using Multilingual Transformer Model mT5

This study aims to explore the effectiveness of cross-lingual transfer learning in building an end-to-end Arabic task-oriented DS using the mT5 transformer model, and presents the cross-centre transfer learning deployed with three different approaches: mSeq2Seq, Cross-lingUAL Pre- training (CPT), and Mixed-Language Pre-training (MLT).

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

A novel data curation method is introduced that generates GlobalWoZ — a large-scale multilingual ToD datasets globalized from an English ToD dataset for three unexplored use cases of mult bilingual ToD systems.

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

An extensive overview of existing methods and resources in multilingual ToD is provided as an entry point  to this exciting and emerging field and draws parallels between components of the ToD pipeline and other NLP tasks, which can inspire solutions for learning in low-resource scenarios.

AraConv: Developing an Arabic Task-Oriented Dialogue System Using Multi-Lingual Transformer Model mT5

This study introduces the first Arabic end-to-end generative model for task-oriented DS (AraConv), which uses the multi-lingual transformer model mT5 with different settings, and indicates the AraConv model performed better in the joint-training setting than in the mono-lingually setting.

Investigating Effect of Dialogue History in Multilingual Task Oriented Dialogue Systems

An efficient and effective training solution for multilingual task-orientated dialogue systems, using the same dataset generation pipeline and end-to-end dialogue system architecture as the SOTA paper BiToD, which adopted some key design choices for a minimalistic natural language design.

Zero-Shot Dialogue State Tracking via Cross-Task Transfer

This work proposes TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multi-choice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST.



Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems

Attention-Informed Mixed-Language Training (MLT) is introduced, a novel zero-shot adaptation method for cross-lingual task-oriented dialogue systems that leverages very few task-related parallel word pairs to generate code-switching sentences for learning the inter-lingUAL semantics across languages.

CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

The large size and rich annotation of CrossWOZ make it suitable to investigate a variety of tasks in cross-domain dialogue modeling, such as dialogue state tracking, policy learning, user simulation, etc.

Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog

This paper presents a new data set of 57k annotated utterances in English, Spanish, Spanish and Thai and uses this data set to evaluate three different cross-lingual transfer methods, finding that given several hundred training examples in the the target language, the latter two methods outperform translating the training data.

Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems

This work identifies two main challenges that combined hinder the faster progress in multilingual TOD: current state-of-the-art TOD models based on large pretrained neural language models are data hungry; at the same time data acquisition for TOD use cases is expensive and tedious.

MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems

This paper introduces Levenshtein belief spans (Lev), that allows efficient dialogue state tracking with a minimal generation length, and greatly improves the inference efficiency of MinTL-based systems.

RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling

RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations, which contains 11.2K human-to-human multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets.

Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset

This work introduces the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains, and presents a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots provided as input.

A Simple Language Model for Task-Oriented Dialogue

SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem, which allows it to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2.

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems

The Action-Based Conversations Dataset (ABCD), a fully-labeled dataset with over 10K human-to-human dialogues containing 55 distinct user intents requiring unique sequences of actions constrained by policies to achieve task success, is introduced.

(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding

Different approaches to train a SLU component with little supervision for two new languages - Hindi and Turkish are examined, and it is shown that with only a few hundred labeled examples the authors can surpass the approaches proposed in the literature.