An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

@article{Paranjape2020AnIB,
  title={An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction},
  author={Bhargavi Paranjape and Mandar Joshi and John Thickstun and Hannaneh Hajishirzi and Luke Zettlemoyer},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.00652}
}
Decisions of complex language understanding models can be rationalized by limiting their inputs to a relevant subsequence of the original text. A rationale should be as concise as possible without significantly degrading task performance, but this balance can be difficult to achieve in practice. In this paper, we show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective. Our fully unsupervised approach jointly learns an explainer… Expand
Weakly- and Semi-supervised Evidence Extraction
TLDR
New methods to combine few evidence annotations (strong semi-supervision) with abundant document-level labels (weak supervision) for the task of evidence extraction are proposed. Expand
EASE: Extractive-Abstractive Summarization with Explanations
TLDR
This work presents an explainable summarization system based on the Information Bottleneck principle that is jointly trained for extraction and abstraction in an end-to-end fashion and shows that explanations from this framework are more relevant than simple baselines, without substantially sacrificing the quality of the generated summary. Expand
Self-training with Few-shot Rationalization: Teacher Explanations Aid Student in Few-shot NLU
TLDR
This work develops a multi-task teacher-student framework based on self-training language models with limited task-specific labels and rationales, and demonstrates that the neural model performance can be significantly improved by making it aware of its rationalized predictions particularly in low-resource settings. Expand
SPECTRA: Sparse Structured Text Rationalization
TLDR
This paper presents a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer and provides a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks. Expand
Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension
TLDR
This work proposes an abstractive approach to generate a question-focused, abstractive summary of input paragraphs and then feed it to an RC system, which can generate more compact explanations than an extractive explainer with limited supervision while maintaining sufficiency. Expand
Are Training Resources Insufficient? Predict First Then Explain!
TLDR
It is argued that the predict-then-explain (PtE) architecture is a more efficient approach in terms of the modelling perspective, and it is shown that the PtE structure is the most data-efficient approach when explanation data are lacking. Expand
Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models
TLDR
This paper shows that by teaching a model to generate explanations alongside its predictions on a large annotated dataset, it can transfer this capability to a low-resource task in another domain and reduces the need for human annotations. Expand
Diagnostics-Guided Explanation Generation
TLDR
This work shows how to directly optimise for Faithfulness and Confidence Indication when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reason-based tasks. Expand
Distribution Matching for Rationalization
TLDR
This work argues that it is crucial to incorporate the following desideratum into modeling—the rationales and proposes a novel distribution matching approach for rationalization that consistently outperforms previous methods by a large margin. Expand
Do Explanations Help Users Detect Errors in Open-Domain QA? An Evaluation of Spoken vs. Visual Explanations
TLDR
It is shown that explanations derived from retrieved evidence can outperform strong baselines across modalities but the best explanation strategy varies with the modality, and end-to-end evaluation of explanations is emphasized. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 39 REFERENCES
Explaining Question Answering Models through Text Generation
TLDR
A model for multi-choice question answering, where a LM-based generator generates a textual hypothesis that is later used by a classifier to answer the question, and produces hypotheses that elucidate the knowledge used by the LM for answering the question. Expand
Interpretable Neural Predictions with Differentiable Binary Variables
TLDR
This work proposes a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE, and can tractably compute the expected value of penalties such as L0, which allows it to directly optimise the model towards a pre-specified text selection rate. Expand
Rationalizing Neural Predictions
TLDR
The approach combines two modular components, generator and encoder, which are trained to operate well together and specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Expand
BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
TLDR
This paper proposes a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, the approach seeks a compressed sentence that can best predict the next sentence. Expand
ERASER: A Benchmark to Evaluate Rationalized NLP Models
TLDR
This work proposes the Evaluating Rationales And Simple English Reasoning (ERASER) a benchmark to advance research on interpretable models in NLP, and proposes several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are. Expand
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
TLDR
It is found that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT. Expand
FEVER: a large-scale dataset for Fact Extraction and VERification
TLDR
This paper introduces a new publicly available dataset for verification against textual sources, FEVER, which consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. Expand
Explain Yourself! Leveraging Language Models for Commonsense Reasoning
TLDR
This work collects human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation framework. Expand
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
TLDR
The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
...
1
2
3
4
...