HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

  title={HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering},
  author={Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning},
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. [...] Key Method We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts…Expand
JAM for HotpotQA
HotpotQA is a recently released question-answering dataset that involves multi-hop reasoning over multiple paragraphs of information to produce an answer. A successful model must not only reportExpand
Learning to Generate Multi-Hop Knowledge Paths for Commonsense Question Answering
  • 2020
Commonsense question answering (QA) requires a model of general background knowledge about how the world operates and how people interact with each other before reasoning. Prior works focus primarilyExpand
QASC: A Dataset for Question Answering via Sentence Composition
This work presents a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question, and presents a two-step approach to mitigate the retrieval challenges. Expand
MeltingpotQA : Multi-hop Question Answering
MeltingpotQA is a question answering model that works on the HotpotQA dataset. HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supportingExpand
Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering
A delexicalized chain representation in which repeated noun phrases are replaced by variables, thus turning them into generalized reasoning chains is explored, finding that generalized chains maintain performance while also being more robust to certain perturbations. Expand
Generating Followup Questions for Interpretable Multi-hop Question Answering
We propose a framework for answering open domain multi-hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single-hopExpand
FeTaQA: Free-form Table Question Answering
This work introduces FeTaQA, a new dataset with 10K Wikipediabased pairs that yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Expand
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
It is shown that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions, and will motivate further research in answering complex questions over long documents. Expand
Repurposing Entailment for Multi-Hop Question Answering Tasks
Multee is introduced, a general architecture that can effectively use entailment models for multi-hop QA tasks and outperforms QA models trained only on the target QA datasets and the OpenAI transformer models when using an entailment function pre-trained on NLI datasets. Expand
NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset
  • Qiyuan Zhang, Lei Wang, +4 authors Ee-Peng Lim
  • Computer Science
  • ArXiv
  • 2021
NoAHQA is introduced, a conversational and bilingual QA dataset with questions requiring numerical reasoning with compound mathematical expressions and a new QA model for generating a reasoning graph where the reasoning graph metric still has a large gap compared with that of humans. Expand


Reading Wikipedia to Answer Open-Domain Questions
This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts. Expand
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers. Expand
The Web as a Knowledge-Base for Answering Complex Questions
This paper proposes to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers, and empirically demonstrates that question decomposition improves performance from 20.8 precision@1 to 27.5 precision @1 on this new dataset. Expand
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference. Expand
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
It is shown that there is a meaningful gap between the human and machine performances, which suggests that the proposed dataset could well serve as a benchmark for question-answering. Expand
SQuAD: 100,000+ Questions for Machine Comprehension of Text
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). Expand
Simple and Effective Multi-Paragraph Reading Comprehension
We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce wellExpand
Know What You Don’t Know: Unanswerable Questions for SQuAD
SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. Expand
Bidirectional Attention Flow for Machine Comprehension
The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Expand
Gated Self-Matching Networks for Reading Comprehension and Question Answering
The gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage, are presented and holds the first place on the SQuAD leaderboard for both single and ensemble model. Expand