Corpus ID: 236447339

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

  title={QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension},
  author={Anna Rogers and Matt Gardner and Isabelle Augenstein},
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae… Expand

Figures and Tables from this paper

ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Archival News Collections
ArchivalQA is presented, a large question answering dataset consisting of 1,067,056 question-answer pairs which is designed for temporal news QA and the novel QA dataset-constructing framework that is introduced can be also applied to create datasets over other types of collections. Expand
Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering
It is found that explicitly aligning the representations across languages with a post-hoc finetuning step generally leads to improved performance. Expand
TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching
This work introduces TOPIOCQA (pronounced Tapioca), an open-domain conversational dataset with topic switches on Wikipedia, and evaluates several baselines, by combining state-of-the-art document retrieval methods with neural reader models. Expand
MFAQ: a Multilingual FAQ Dataset
The first multilingual FAQ dataset publicly available, collected around 6M FAQ pairs from the web, in 21 different languages, and reveals that a multilingual model based on XLM-RoBERTa achieves the best results, except for English. Expand
Generating Answer Candidates for Quizzes and Answer-Aware Question Generators
This work proposes a model that can generate a specified number of answer candidates for a given passage of text, which can then be used by instructors to write questions manually or can be passed as an input to automatic answeraware question generators. Expand
Evaluation Paradigms in Question Answering
Question answering (QA) primarily descends from two branches of research: (1) Alan Turing’s investigation of machine intelligence at Manchester University and (2) Cyril Cleverdon’s comparison ofExpand
COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning
  • 2021
We present Semi-Structured Explanations for COPA (COPA-SSE), a new crowdsourced dataset of 9,747 semi-structured, English common sense explanations for COPA questions. The explanations are formattedExpand
ConveRT for FAQ Answering
A novel pre-training procedure to adapt ConveRT, an English conversational retriever model, to other languages with less training data available is proposed and applied for the first time to the task of Dutch FAQ answering related to the COVID-19 vaccine. Expand


Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering
This work reviews the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques, and revisiting the origin and development of Open QA systems. Expand
ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension
An evaluation server, ORB, is presented, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena. Expand
Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks
QuAIL is presented, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system. Expand
Automatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering
The Translate Align Retrieve (TAR) method is developed to automatically translate the Stanford Question Answering Dataset (SQuAD) v1.1 to Spanish, and this dataset is used to train Spanish QA systems by fine-tuning a Multilingual-BERT model. Expand
ELI5: Long Form Question Answering
This work introduces the first large-scale corpus for long form question answering, a task requiring elaborate and in-depth answers to open-ended questions, and shows that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline. Expand
Semi-supervised Training Data Generation for Multilingual Question Answering
This work annotate seed QA pairs of small size for Korean language, and designs how such seed can be combined with translated English resources to enable leveraging such resources. Expand
MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization
This work proposes MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization, and benchmarks existing methods and a novel multi-task baseline overMATINF to inspire further research. Expand
WikiQA: A Challenge Dataset for Open-Domain Question Answering
The WIKIQA dataset is described, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering, which is more than an order of magnitude larger than the previous dataset. Expand
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Multilingual Knowledge Questions and Answers (MKQA), an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages, is introduced to provide a challenging benchmark for question answering quality across a wide set of languages. Expand
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
Results suggest that most of the questions already answered correctly by the model do not necessarily require grammatical and complex reasoning, and therefore, MRC datasets will need to take extra care in their design to ensure that questions can correctly evaluate the intended skills. Expand