Reasoning Over Paragraph Effects in Situations

@article{Lin2019ReasoningOP,
  title={Reasoning Over Paragraph Effects in Situations},
  author={Kevin Lin and Oyvind Tafjord and Peter Clark and Matt Gardner},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.05852}
}
A key component of successfully reading a passage of text is the ability to apply knowledge gained from the passage to a new situation. In order to facilitate progress on this kind of reading, we present ROPES, a challenging benchmark for reading comprehension targeting Reasoning Over Paragraph Effects in Situations. We target expository language describing causes and effects (e.g., “animal pollinators increase efficiency of fertilization in flowers”), as they have clear implications for new… 

Figures and Tables from this paper

On Making Reading Comprehension More Comprehensive
TLDR
This work justifies a question answering approach to reading comprehension and describes the various kinds of questions one might use to more fully test a system’s comprehension of a passage, moving beyond questions that only probe local predicate-argument structures.
Towards Interpretable Reasoning over Paragraph Effects in Situation
TLDR
A sequential approach for the task of reasoning over paragraph effects in situation which explicitly models each step of the reasoning process with neural network modules, which leads to a more interpretable model.
Procedural Reading Comprehension with Attribute-Aware Context Flow
TLDR
An algorithm for procedural reading comprehension is introduced by translating the text into a general formalism that represents processes as a sequence of transitions over entity attributes (e.g., location, temperature).
CURIE: An Iterative Querying Approach for Reasoning About Situations
TLDR
CURIE is proposed, a method to iteratively build a graph of relevant consequences explicitly in a structured situational graph (st graph) using natural language queries over a finetuned language model and it is shown that these improvements mainly come from a hard subset of the data, that requires background knowledge and multi-hop reasoning.
Comprehensive Multi-Dataset Evaluation of Reading Comprehension
TLDR
An evaluation server, ORB, is presented, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model’s capability in understanding a wide variety of reading phenomena.
Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading
TLDR
This work proposes Discern, a discourse-aware entailment reasoning network to strengthen the connection and enhance the understanding for both document and dialog, and splits the document into clause-like elementary discourse units using a pre-trained discourse segmentation model.
Transformers as Soft Reasoners over Language
TLDR
This work trains transformers to reason (or emulate reasoning) over natural language sentences using synthetically generated data, thus bypassing a formal representation and suggesting a new role for transformers, namely as limited "soft theorem provers" operating over explicit theories in language.
A Survey on Machine Reading Comprehension Systems
TLDR
It is demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single to multi-document reading comprehension, and from learning from scratch to using pre-trained embeddings.
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
TLDR
This paper examines three strong generative models -- T5, BART, and GPT-2 -- and examines methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness through fine-tuning, post-hoc probability modification, or adjustment of the predicted outputs or inputs.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
TLDR
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
TLDR
The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
TLDR
A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
Interpretation of Natural Language Rules in Conversational Machine Reading
TLDR
This paper formalise this task and develops a crowd-sourcing strategy to collect 37k task instances based on real-world rules and crowd-generated questions and scenarios to assess its difficulty by evaluating the performance of rule-based and machine-learning baselines.
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
TLDR
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
Gated Self-Matching Networks for Reading Comprehension and Question Answering
TLDR
The gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage, are presented and holds the first place on the SQuAD leaderboard for both single and ensemble model.
RACE: Large-scale ReAding Comprehension Dataset From Examinations
TLDR
The proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models and the ceiling human performance.
QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships
TLDR
This work makes inroads into answering complex, qualitative questions that require reasoning, and scaling to new relationships at low cost, with two novel models for this task built as extensions of type-constrained semantic parsing.
A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
TLDR
A thorough examination of this new reading comprehension task by creating over a million training examples by pairing CNN and Daily Mail news articles with their summarized bullet points, and showing that a neural network can be trained to give good performance on this task.
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
...
1
2
...