Explaining Answers with Entailment Trees

@inproceedings{Dalvi2021ExplainingAW,
  title={Explaining Answers with Entailment Trees},
  author={Bhavana Dalvi and Peter Alexander Jansen and Oyvind Tafjord and Zhengnan Xie and Hannah Smith and Leighanna Pipatanangkura and Peter Clark},
  booktitle={EMNLP},
  year={2021}
}
Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by showing the line of reasoning from what is known to the answer, rather than simply showing a fragment of textual evidence (a “rationale”). If this could be done, new opportunities for understanding and debugging the system’s reasoning become possible. Our approach is to generate explanations in the form of entailment trees, namely a tree of multipremise entailment steps from facts that are known… 

Figures and Tables from this paper

METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation
TLDR
A Module-based Entailment Tree GENeration framework that has multiple modules and a reasoning controller that can outperform previous state-of-the-art models with only 9% of the parameters.
Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner
TLDR
This work proposes an architecture called Iterative Retrieval-Generation Reasoner (IRGR), able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises, allowing the model to leverage intermediate conclusions, and mitigating the input size limit of baseline encoder-decoder models.
Towards Teachable Reasoning Systems
TLDR
Generated chains of reasoning show how answers are implied by the system’s own internal beliefs, and are both faithful and truthful, which suggests new opportunities for using language models in an interactive setting where users can inspect, debug, correct, and improve a system‘s performance over time.
Natural Language Deduction through Search over Statement Compositions
TLDR
This work proposes a system for natural language deduction that decomposes the task into separate steps coordinated by best-first search, producing a tree of intermediate conclusions that faithfully reflects the system’s reasoning process.
Flexible Generation of Natural Language Deductions
TLDR
ParaPattern is described, a method for building models to generate deductive inferences from diverse natural language inputs without direct human supervision that achieves 85% validity on examples of the ‘substitution’ operation from EntailmentBank without the use of any in-domain training data.
NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset
TLDR
N OAH QA is introduced, a conversational and bilingual QA dataset with questions requiring numerical reasoning with compound mathematical expressions and a new QA model for generating a reasoning graph where the pairs are pairs.
Scientific Explanation and Natural Language: A Unified Epistemological-Linguistic Perspective for Explainable AI
A fundamental research goal for Explainable AI (XAI) is to build models that are capable of reasoning through the generation of natural language explanations . However, the methodologies to design
Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning
TLDR
This work studies pre-trained language models that generate explanation graphs in an end-to-end manner and analyzes their ability to learn the structural constraints and semantics of such graphs and proposes simple yet effective ways of graph perturbations via node and edge edit operations that lead to structurally and semantically positive and negative graphs.
DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models
TLDR
The empirical findings vindicate the overall framework and highlight the advantages of a modular design, in particular its ability to emulate established heuristics, to explore the model’s uncertainty, to cope with the plurality of correct solutions (underdetermination), and to exploit higher-order evidence.
LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models
TLDR
A new dataset to evaluate the ability of models to perform logical inference using propositional logic and a small subset of first-order logic, represented both in semi-formal logical notation, as well as in natural language is presented.
...
1
2
...

References

SHOWING 1-10 OF 35 REFERENCES
WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-Hop Inference
TLDR
A corpus of explanations for standardized science exams, a recent challenge task for question answering, is presented and an explanation-centered tablestore is provided, a collection of semi-structured tables that contain the knowledge to construct these elementary science explanations.
WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference
TLDR
This work presents the second iteration of the WorldTree project, a corpus of 5,114 standardized science exam questions paired with large detailed multi-fact explanations that combine core scientific knowledge and world knowledge, and uses this explanation corpus to author a set of 344 high-level science domain inference patterns similar to semantic frames supporting multi-hop inference.
Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering
TLDR
A delexicalized chain representation in which repeated noun phrases are replaced by variables, thus turning them into generalized reasoning chains is explored, finding that generalized chains maintain performance while also being more robust to certain perturbations.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
TLDR
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
PRover: Proof Generation for Interpretable Reasoning over Rules
TLDR
This work proposes PROVER, an interpretable transformer-based model that jointly answers binary questions over rule-bases and generates the corresponding proofs, and learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
TLDR
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.
Benchmarking Applied Semantic Inference: The PASCAL Recognising Textual Entailment Challenges
TLDR
This paper describes the series of benchmarks developed for the textual entailment recognition task, known as the PASCAL RTE Challenges, and describes in detail the second RTE challenge, in which the methodology was consolidated, and served as a basis for the subsequent RTE challenges.
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
TLDR
This work introduces StrategyQA, a question answering benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy, and proposes a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts.
Finding Generalizable Evidence by Learning to Convince Q&A Models
TLDR
This approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions.
e-SNLI: Natural Language Inference with Natural Language Explanations
TLDR
The Stanford Natural Language Inference dataset is extended with an additional layer of human-annotated natural language explanations of the entailment relations, which can be used for various goals, such as obtaining full sentence justifications of a model’s decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets.
...
1
2
3
4
...