ODSQA: Open-Domain Spoken Question Answering Dataset

  title={ODSQA: Open-Domain Spoken Question Answering Dataset},
  author={Chia-Hsuan Lee and Shang-Ming Wang and Huan-Cheng Chang and Hung-yi Lee},
  journal={2018 IEEE Spoken Language Technology Workshop (SLT)},
Reading comprehension by machine has been widely studied, but machine comprehension of spoken content is still a less investigated problem. In this paper, we release Open-Domain Spoken Question Answering Dataset (ODSQA) with more than three thousand questions. To the best of our knowledge, this is the largest real SQA dataset. On this dataset, we found that ASR errors have catastrophic impact on SQA. To mitigate the effect of ASR errors, subword units are involved, which brings consistent… 

Figures and Tables from this paper

Knowledge Distillation for Improved Accuracy in Spoken Question Answering

  • Chenyu YouNuo ChenYuexian Zou
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
This work devise a training strategy to perform knowledge distillation (KD) from spoken documents and written counterparts to improve the performance of the student model by reducing the misalignment between automatic and manual transcripts.

Improving Spoken Question Answering Using Contextualized Word Representation

  • Dan SuPascale Fung
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
This paper proposes using contextualized word representations to mitigate the effects of ASR errors and pretraining on existing textual QA datasets to mitigateThe data scarcity issue.

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Discrete Spoken Unit Adaptive Learning (DUAL) is proposed, leveraging unlabeled data for pre-training and beingtuned by the SQA downstream task, which empirically showed yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

A novel multi-modal residual knowledge distillation method (MRD-Net), which further distills knowledge at the acoustic level from the audio-assistant (Audio-A) and proposes a simple yet effective attention mechanism to adaptively leverage audio-text features as the new deep attention knowledge to boost the network performance.

Mitigating Noisy Inputs for Question Answering

This work investigates and mitigate the effects of noise from Automatic Speech Recognition systems on two factoid Question Answering (QA) tasks, and empirically shown to improve the accuracy of downstream neural QA systems.

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

A novel data distillation approach, DDN ET, is proposed, which effectively in-gests cross-modal information to achievene-grained representations of the speech and language modalities to ease the process of knowledge transfer.

Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

This work proposes to mitigate the ASR errors by aligning the mismatch between ASR hypotheses and their corresponding reference transcriptions by applying an adversarial model to this domain adaptation task.

Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering

CADNet is proposed, a novel contextualized attention-based distillation approach, which applies both cross-att attention and self-attention to obtain ASR-robust contextualized embedding representations of the passage and dialogue history for performance improvements on SCQA.


Various techniques to mitigate the effects of ASR errors, and to increase the accuracy of the predicted answers are discussed.

An Initial Investigation of Non-Native Spoken Question-Answering

It is found that there is an approximately linear relationship between ASR errors and the SQA assessment scores but grammar mismatches have minimal impact.



Reading Wikipedia to Answer Open-Domain Questions

This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts.

Learning to Paraphrase for Question Answering

This paper presents a general framework which learns felicitous paraphrases for various QA tasks and shows that this framework consistently improves performance, achieving competitive results despite the use of simple QA models.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

NewsQA: A Machine Comprehension Dataset

NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.

DRCD: a Chinese Machine Reading Comprehension Dataset

DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset, is introduced, which can be a source dataset in transfer learning.

Supervised and Unsupervised Transfer Learning for Question Answering

The performance of both models on a TOEFL listening comprehension test and MCTest is significantly improved via a simple transfer learning technique from MovieQA, which achieves the state-of-the-art on all target datasets.

Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension

On the new listening comprehension task, it is found that speech recognition errors have catastrophic impact on machine comprehension, and several approaches are proposed to mitigate the impact.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

Factoid Question Answering for Spoken Documents

This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken documents scenario, and study new information retrieval (IR) techniques designed for speech, and utilize several levels of linguistic information for the speech-based QA task.

(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding

Different approaches to train a SLU component with little supervision for two new languages - Hindi and Turkish are examined, and it is shown that with only a few hundred labeled examples the authors can surpass the approaches proposed in the literature.