Learning with Instance Bundles for Reading Comprehension

  title={Learning with Instance Bundles for Reading Comprehension},
  author={Dheeru Dua and Pradeep Dasigi and Sameer Singh and Matt Gardner},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other. However, closely related questions and their corresponding answers are not independent, and leveraging these relationships could provide a strong supervision signal to a model. Drawing on ideas from contrastive estimation, we introduce several new supervision losses that compare question-answer scores across multiple related instances… 

Figures and Tables from this paper

Retrieval-guided Counterfactual Generation for QA

This work develops a Retrieve-Generate-Filter technique to create counterfactual evaluation and training data with minimal human supervision, and finds that RGF data leads to significant improvements in a model’s robustness to local perturbations.

Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition

This work introduces the “Break, Perturb, Build” (BPB) framework for automatic reasoning-oriented perturbation of question-answer pairs, and demonstrates the effectiveness of BPB by creating evaluation sets for three reading comprehension benchmarks, generating thousands of high-quality examples without human intervention.

Event-Centric Question Answering via Contrastive Learning and Invertible Event Transformation

Qualitative analysis reveals the high quality of the generated answers by TranCLR, demonstrating the feasi-bility of injecting event knowledge into QA model learning.

Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization

This paper presents a novel data augmentation technique and combines it with a unique learning procedure for that task, and proves that acda-boosted pre-trained language models that employ the combined approach consistently outperform the respective fine-tuned baseline pre- trained language models across both benchmark datasets and adversarial examples.

CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation

Experiments on natural language inference and sentiment analysis benchmarks show that CORE counterfactuals are more effective at improving generalization to OOD data compared to other DA approaches, and the CORE retrieval framework can be used to encourage diversity in manually authored perturbations.

Open Temporal Relation Extraction for Question Answering

This paper decomposes each question into a question event and an open temporal relation (OTR) which is not pre-defined nor with timestamps, and ground the former in the context while sharing the representation of the latter across contexts.

Successive Prompting for Decomposing Complex Questions

A way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions is introduced and achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset.



Logic-Guided Data Augmentation and Regularization for Consistent Question Answering

This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by integrating logic rules and neural models by leveraging logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.

Paired Examples as Indirect Supervision in Latent Decision Models

A way to leverage paired examples that provide stronger cues for learning latent decisions and improves both in- and out-of-distribution generalization and leads to correct latent decision predictions is introduced.

Adversarial Examples for Evaluating Reading Comprehension Systems

This work proposes an adversarial evaluation scheme for the Stanford Question Answering Dataset that tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences without changing the correct answer or misleading humans.

RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering

A simple and effective re-ranking approach (RECONSIDER) for span-extraction tasks, that improves upon the performance of large pre-trained MRC models, and achieves a new state of the art on four QA tasks.

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

This work presents a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia, and shows that state-of-the-art reading comprehension models perform significantly worse than humans on this benchmark.

Evaluating Models’ Local Decision Boundaries via Contrast Sets

A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

Are Red Roses Red? Evaluating Consistency of Question-Answering Models

A method to automatically extract implications for instances from two QA datasets, VQA and SQuAD, which is used to evaluate the consistency of models and shows these generated implications are well formed and valid.

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

This work proposes an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets, a.k.a counterfactual examples, which provide a signal indicative of the underlying causal structure of the task.

An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language Models

This paper demonstrates that appropriately using negative examples about particular constructions will boost the model’s robustness on them in English, with a negligible loss of perplexity, and can be a tool to analyze the true architectural limitation of neural models on challenging linguistic constructions.