TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

  title={TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension},
  author={Mandar Joshi and Eunsol Choi and Daniel S. Weld and Luke Zettlemoyer},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. [] Key Method We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study. Data and code available at -- this http URL

Figures and Tables from this paper

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.

NLQuAD: A Non-Factoid Long Question Answering Data Set

NLQuAD’s samples exceed the input limitation of most pre-trained Transformer-based models, encouraging future research on long sequence language models and shows that Longformer outperforms the other architectures, but results are still far behind a human upper bound.

R3: Reinforced Reader-Ranker for Open-Domain Question Answering

A new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of generating the ground-truth answer to a given question, and a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning.

R3: Reinforced Ranker-Reader for Open-Domain Question Answering

This paper proposes a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question, and proposes a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning.

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Multilingual Knowledge Questions and Answers is introduced, an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages, making results comparable across languages and independent of language-specific passages.

Translucent Answer Predictions in Multi-Hop Reading Comprehension

This paper proposes a novel deep neural architecture, called TAP (Translucent Answer Prediction), to identify answers and evidence (in the form of supporting facts) in an RCQA task requiring multi-hop reasoning.

Quizbowl: The Case for Incremental Question Answering

This work makes two key contributions to machine learning research through Quizbowl: collecting and curating a large factoid QA dataset and an accompanying gameplay dataset, and developing a computational approach to playing Quiz Bowl that involves determining both what to answer and when to answer.

Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks

QuAIL is presented, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system.

Relation Module for Non-Answerable Predictions on Reading Comprehension

This paper aims to improve a MRC model’s ability to determine whether a question has an answer in a given context (e.g. the recently proposed SQuAD 2.0 task), and shows the effectiveness of the relation module on MRC.

Relation Module for Non-answerable Prediction on Question Answering

This paper aims to improve a MRC model's ability to determine whether a question has an answer in a given context (e.g. the recently proposed SQuAD 2.0 task) by developing a relation module that is adaptable to any MRC models.



SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

NewsQA: A Machine Comprehension Dataset

NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task

A thorough examination of this new reading comprehension task by creating over a million training examples by pairing CNN and Daily Mail news articles with their summarized bullet points, and showing that a neural network can be trained to give good performance on this task.

Large-scale Simple Question Answering with Memory Networks

This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions.

Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

This work casts neural QA as a sequence labeling problem and proposes an end-to-end sequence labeling model, which overcomes all the above challenges and outperforms the baselines significantly on WebQA.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

It is shown that there is a meaningful gap between the human and machine performances, which suggests that the proposed dataset could well serve as a benchmark for question-answering.

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.

WikiQA: A Challenge Dataset for Open-Domain Question Answering

The WIKIQA dataset is described, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering, which is more than an order of magnitude larger than the previous dataset.

A Neural Network for Factoid Question Answering over Paragraphs

This work introduces a recursive neural network model, qanta, that can reason over question text input by modeling textual compositionality and applies it to a dataset of questions from a trivia competition called quiz bowl.