Repurposing Entailment for Multi-Hop Question Answering Tasks

@article{Trivedi2019RepurposingEF,
  title={Repurposing Entailment for Multi-Hop Question Answering Tasks},
  author={H. Trivedi and Heeyoung Kwon and Tushar Khot and Ashish Sabharwal and Niranjan Balasubramanian},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.09380}
}
Question Answering (QA) naturally reduces to an entailment problem, namely, verifying whether some text entails the answer to a question. [...] Key Method Multee uses (i) a local module that helps locate important sentences, thereby avoiding distracting information, and (ii) a global module that aggregates information by effectively incorporating importance weights. Importantly, we show that both modules can use entailment functions pre-trained on a large scale NLI datasets. We evaluate performance on MultiRC…Expand
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
TLDR
This work introduces a simple, fast, and unsupervised iterative evidence retrieval method that outperforms all the previous methods on the evidence selection task on two datasets: MultiRC and QASC. Expand
Can NLI Models Verify QA Systems' Predictions?
TLDR
The use of natural language inference (NLI) is explored as a way to achieve robust question answering systems, and it is shown that the NLI approach can generally improve the confidence estimation of a QA model across different domains. Expand
Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering
TLDR
The justification sentences selected by this unsupervised strategy improve the performance of a state-of-the-art supervised QA model on two multi-hop QA datasets: AI2’s Reasoning Challenge and Multi-Sentence Reading Comprehension. Expand
Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks
TLDR
It is conjecture that a key difference between theNLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model. Expand
SILT: Efficient transformer training for inter-lingual inference
TLDR
Evidence is found that SILT allows to reduce drastically the number of trainable parameters while allowing for inter-lingual NLI and achieving state-of-the-art performance on common benchmarks. Expand
Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options
TLDR
This work investigates two alternative protocols which automatically create candidate (premise, hypothesis) pairs for annotators to label and concludes that crowdworker writing is still the best known option for entailment data. Expand
Looking Beyond Sentence-Level Natural Language Inference for Question Answering and Text Summarization
TLDR
These findings show that the relatively shorter length of premises in traditional NLI datasets is the primary challenge prohibiting usage in downstream applications, and this challenge can be addressed by automatically converting resource-rich reading comprehension datasets into longer-premiseNLI datasets. Expand
Reading Comprehension as Natural Language Inference:A Semantic Analysis
TLDR
This paper transforms one of the largest available MRC dataset (RACE) to an NLI form, and compares the performances of a state-of-the-art model (RoBERTa) on both these forms. Expand
Answer Ranking for Product-Related Questions via Multiple Semantic Relations Modeling
TLDR
This paper proposes an answer ranking model named MUSE which carefully models multiple semantic relations among the question, answers, and relevant reviews, and achieves superior performance on the concerned answer ranking task. Expand
OCNLI: Original Chinese Natural Language Inference
TLDR
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 30 REFERENCES
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
TLDR
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions. Expand
SciTaiL: A Textual Entailment Dataset from Science Question Answering
TLDR
A new dataset and model for textual entailment, derived from treating multiple-choice question-answering as an entailment problem, is presented, and it is demonstrated that one can improve accuracy on SCITAIL by 5% using a new neural model that exploits linguistic structure. Expand
An Entailment-Based Approach to the QA4MRE Challenge
TLDR
This paper describes the entry to the 2012 QA4MRE Main Task, and estimates the likelihood of textual entailment between sentences in the text, and the question Q and each candidate answer Ai and finds sets of sentences SQ, SA that each plausibly entail Q or one of the Ai respectively. Expand
Answering Science Exam Questions Using Query Rewriting with Background Knowledge
TLDR
A system that rewrites a given question into queries that are used to retrieve supporting text from a large corpus of science-related text is presented and is able to outperform several strong baselines on the ARC dataset. Expand
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
TLDR
A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference. Expand
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
TLDR
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. Expand
Question Answering by Reasoning Across Documents with Graph Convolutional Networks
TLDR
A neural model which integrates and reasons relying on information spread within documents and across multiple documents is introduced, which achieves state-of-the-art results on a multi-document question answering dataset, WikiHop. Expand
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
TLDR
The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%. Expand
Methods for Using Textual Entailment in Open-Domain Question Answering
TLDR
It is demonstrated how computational systems designed to recognize textual entailment can be used to enhance the accuracy of current open-domain automatic question answering (Q/A) systems. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
...
1
2
3
...