HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

  title={HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering},
  author={Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. [] Key Method We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts…

JAM for HotpotQA

A pipeline model is developed for HotpotQA, consisting of a supporting facts classifier that produces relevant sentences which are grouped and used by a question answering model and an implementation of Bidirectional Encoder Representations from Transformers using the SQuAD dataset.

Learning to Generate Multi-Hop Knowledge Paths for Commonsense Question Answering

  • Computer Science
  • 2020
This paper proposes to learn a multi-hop knowledge path generator to generate structured evidence dynamically according to the questions, leveraging a large amount of unstructured knowledge stored in the language model to supplement the incompleteness of the knowledge base.

Explanations for CommonsenseQA: New Dataset and Models

This work human-annotates a first-of-its-kind dataset of positive and negative properties, as well as free-flow explanations, for 11K QA pairs taken from the CQA dataset, and proposes a latent representation based property retrieval model aswell as a GPT-2 based property generation model with a novel two step fine-tuning procedure.

QASC: A Dataset for Question Answering via Sentence Composition

This work presents a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question, and provides annotation for supporting facts as well as their composition.

MeltingpotQA : Multi-hop Question Answering

This work advances the baseline model to more appropriately tackle the task of multi-document question answering, and focuses on the problem of relevant paragraph selection, where the model has to decide which documents to use for answering questions.

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

A delexicalized chain representation in which repeated noun phrases are replaced by variables, thus turning them into generalized reasoning chains is explored, finding that generalized chains maintain performance while also being more robust to certain perturbations.

Generating Followup Questions for Interpretable Multi-hop Question Answering

We propose a framework for answering open domain multi-hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single-hop

Repurposing Entailment for Multi-Hop Question Answering Tasks

Multee is introduced, a general architecture that can effectively use entailment models for multi-hop QA tasks and outperforms QA models trained only on the target QA datasets and the OpenAI transformer models when using an entailment function pre-trained on NLI datasets.

Prompt-based Conservation Learning for Multi-hop Question Answering

Experimental results on the HotpotQA benchmark show that PCL is competitive for multi-hop QA and retains good performance on the corresponding single-hop sub-questions, demonstrating the efficacy of PCL in mitigating knowledge loss by forgetting.

QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

QQA models from the retrieve-and-read family are trained, showing that QAMP AR I is challenging in terms of both passage retrieval and answer generation, reaching an F 1 score of 26.6 at best.



TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.

Reading Wikipedia to Answer Open-Domain Questions

This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts.

The Web as a Knowledge-Base for Answering Complex Questions

This paper proposes to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers, and empirically demonstrates that question decomposition improves performance from 20.8 precision@1 to 27.5 precision @1 on this new dataset.

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference.

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

It is shown that there is a meaningful gap between the human and machine performances, which suggests that the proposed dataset could well serve as a benchmark for question-answering.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

Know What You Don’t Know: Unanswerable Questions for SQuAD

SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

Bidirectional Attention Flow for Machine Comprehension

The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.

Simple and Effective Multi-Paragraph Reading Comprehension

It is shown that it is possible to significantly improve performance by using a modified training scheme that teaches the model to ignore non-answer containing paragraphs, which involves sampling multiple paragraphs from each document, and using an objective function that requires themodel to produce globally correct output.

Gated Self-Matching Networks for Reading Comprehension and Question Answering

The gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage, are presented and holds the first place on the SQuAD leaderboard for both single and ensemble model.