• Corpus ID: 52182179

Transforming Question Answering Datasets Into Natural Language Inference Datasets

  title={Transforming Question Answering Datasets Into Natural Language Inference Datasets},
  author={Dorottya Demszky and Kelvin Guu and Percy Liang},
Existing datasets for natural language inference (NLI) have propelled research on language understanding. We propose a new method for automatically deriving NLI datasets from the growing abundance of large-scale question answering datasets. Our approach hinges on learning a sentence transformation model which converts question-answer pairs into their declarative forms. Despite being primarily trained on a single QA dataset, we show that it can be successfully applied to a variety of other QA… 

Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks

It is conjecture that a key difference between theNLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model.

Enhancing Natural Language Inference Using New and Expanded Training Data Sets and New Learning Models

A modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures is proposed and the resulting models perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmarks that emphasize “roles” and “entities”.

Looking Beyond Sentence-Level Natural Language Inference for Question Answering and Text Summarization

These findings show that the relatively shorter length of premises in traditional NLI datasets is the primary challenge prohibiting usage in downstream applications, and this challenge can be addressed by automatically converting resource-rich reading comprehension datasets into longer-premiseNLI datasets.

Reading Comprehension as Natural Language Inference:A Semantic Analysis

This paper transforms one of the largest available MRC dataset (RACE) to an NLI form, and compares the performances of a state-of-the-art model (RoBERTa) on both these forms.

Can NLI Models Verify QA Systems' Predictions?

Careful manual analysis over the predictions of the NLI model shows that it can further identify cases where the QA model produces the right answer for the wrong reason, i.e., when the answer sentence does not address all aspects of the question.

DocNLI: A Large-scale Dataset for Document-level Natural Language Inference

This work presents DOCNLI — a newly-constructed large-scale dataset for document-level NLI that shows promising performance on popular sentence-level benchmarks, and generalizes well to out-of-domain NLP tasks that rely on inference at document granularity.

Use of Natural Language Inference in Optimizing Reviews and Providing Insights to end Consumers

A Recognizing Textual Entailment wherein the task is to recognize whether a given hypothesis is true (Entailment), false (Contradiction) or unrelated(neutral) with respect to the sentence called premise to bring sustainable development in the classification methods used by major E-commerce companies.

Testing the Reasoning Power for NLI Models with Annotated Multi-perspective Entailment Dataset

A Multi-perspective Entailment Category Labeling System (METALs), which consists of three categories, ten sub-categories, and manually annotate 3,368 entailment items to explain the recognition ability of four NN-based models at a fine-grained level.

Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options

This work investigates two alternative protocols which automatically create candidate (premise, hypothesis) pairs for annotators to label and concludes that crowdworker writing is still the best known option for entailment data.

FarsTail: A Persian Natural Language Inference Dataset

A new dataset for the NLI task in the Persian language, also known as Farsi, which is one of the dominant languages in the Middle East is presented and the best obtained test accuracy is 78.13% which shows that there is a big room for improving the current methods to be useful for real-world NLP applications in different languages.



Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The

Towards a Unified Natural Language Inference Framework to Evaluate Sentence Representations

This work generates a large-scale NLI dataset by recasting 11 existing datasets from 7 different semantic tasks, and uses this dataset of approximately half a million context-hypothesis pairs to test how well sentence encoders capture distinct semantic phenomena that are necessary for general language understanding.

Generating Natural Language Inference Chains

A new task is proposed that measures how well a model can generate an entailed sentence from a source sentence and takes entailment-pairs of the Stanford Natural Language Inference corpus and trains an LSTM with attention, and applies this model recursively to input-output pairs, thereby generating natural language inference chains.

A large annotated corpus for learning natural language inference

The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

Crowdsourcing Question-Answer Meaning Representations

A crowdsourcing scheme is developed to show that QAMRs can be labeled with very little training, and a qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets.

Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language

The results show that non-expert annotators can produce high quality QA-SRL data, and also establish baseline performance levels for future work on this task, and introduce simple classifierbased models for predicting which questions to ask and what their answers should be.