FRAME: Evaluating Simulatability Metrics for Free-Text Rationales

@article{Chan2022FRAMEES,
  title={FRAME: Evaluating Simulatability Metrics for Free-Text Rationales},
  author={Aaron Chan and Shaoliang Nie and Liang Tan and Xiaochang Peng and Hamed Firooz and Maziar Sanjabi and Xiang Ren},
  journal={ArXiv},
  year={2022},
  volume={abs/2207.00779}
}
Free-text rationales aim to explain neural language model (LM) behavior more flexibly and intuitively via natural language. To ensure rationale quality, it is important to have metrics for measuring rationales’ faithfulness (re-flects LM’s actual behavior) and plausibility (convincing to humans). All existing free-text rationale metrics are based on simulatability (association between rationale and LM’s predicted label), but there is no protocol for assessing such metrics’ reliability. To… 

References

SHOWING 1-10 OF 51 REFERENCES

UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

TLDR
UNIREX, a flexible learning framework which generalizes rationale extractor optimization as follows, and introduces the Normalized Relative Gain (NRG) metric, which finds that UNIREX-trained rationale extractors’ faithfulness can even generalize to unseen datasets and tasks.

Measuring Association Between Labels and Free-Text Rationales

TLDR
It is demonstrated that *pipelines*, models for faithful rationalization on information-extraction style tasks, do not work as well on “reasoning” tasks requiring free-text rationales, and state-of-the-art T5-based joint models exhibit desirable properties for explaining commonsense question-answering and natural language inference.

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

TLDR
A leakage-adjusted simulatability (LAS) metric is introduced for evaluating NL explanations, which measures how well explanations help an observer predict a model’s output, while controlling for how explanations can directly leak the output.

Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

TLDR
The current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful, and is called for discarding the binary notion of faithfulness in favor of a more graded one, which is of greater practical utility.

ERASER: A Benchmark to Evaluate Rationalized NLP Models

TLDR
This work proposes the Evaluating Rationales And Simple English Reasoning (ERASER) a benchmark to advance research on interpretable models in NLP, and proposes several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are.

Towards Explainable NLP: A Generative Explanation Framework for Text Classification

TLDR
A novel generative explanation framework that learns to make classification decisions and generate fine-grained explanations at the same time and introduces the explainable factor and the minimum risk training approach that learn to generate more reasonable explanations.

NILE : Natural Language Inference with Faithful Natural Language Explanations

TLDR
This work proposes Natural-language Inference over Label-specific Explanations (NILE), a novel NLI method which utilizes auto-generated label-specific NL explanations to produce labels along with its faithful explanation and demonstrates NILE’s effectiveness over previously reported methods through automated and human evaluation of the produced labels and explanations.

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

TLDR
This work collects human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation framework.

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

TLDR
Human subject tests are carried out that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors.

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

TLDR
It is quantitatively shown that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision, supporting the thesis that multimodal explanation models offer significant benefits over unimodal approaches.
...