Corpus ID: 229371403

To what extent do human explanations of model behavior align with actual model behavior?

@article{Prasad2020ToWE,
  title={To what extent do human explanations of model behavior align with actual model behavior?},
  author={Grusha Prasad and Yixin Nie and M. Bansal and Robin Jia and Douwe Kiela and Adina Williams},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.13354}
}
Given the increasingly prominent role NLP models (will) play in our lives, it is important to evaluate models on their alignment with human expectations of how models behave. Using Natural Language Inference (NLI) as a case study, we investigated the extent to which human-generated explanations of models’ inference decisions align with how models actually make these decisions. More specifically, we defined two alignment metrics that quantify how well natural language human explanations align… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 47 REFERENCES
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
  • 28
  • PDF
e-SNLI: Natural Language Inference with Natural Language Explanations
  • 97
  • PDF
LIREx: Augmenting Language Inference with Relevant Explanation
  • 2
  • PDF
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
  • 4,225
  • PDF
Attention is not Explanation
  • 299
  • PDF
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
  • 255
  • PDF
A Unified Approach to Interpreting Model Predictions
  • 2,242
  • PDF
Explain Yourself! Leveraging Language Models for Commonsense Reasoning
  • 103
  • PDF
Annotation Artifacts in Natural Language Inference Data
  • 398
  • PDF
...
1
2
3
4
5
...