Human Interpretation of Saliency-based Explanation Over Text

@article{Schuff2022HumanIO,
  title={Human Interpretation of Saliency-based Explanation Over Text},
  author={Hendrik Schuff and Alon Jacovi and Heike Adel and Yoav Goldberg and Ngoc Thang Vu},
  journal={2022 ACM Conference on Fairness, Accountability, and Transparency},
  year={2022}
}
While a lot of research in explainable AI focuses on producing effective explanations, less work is devoted to the question of how people understand and interpret the explanation. In this work, we focus on this question through a study of saliency-based explanations over textual data. Feature-attribution explanations of text models aim to communicate which parts of the input text were more influential than others towards the model decision. Many current explanation methods, such as gradient… 

How (Not) To Evaluate Explanation Quality

This paper substantiates theoretical claims (i.e., the lack of validity and temporal decline of currently-used proxy scores) with empirical evidence from a crowdsourcing case study in which it investigates the explanation quality of state-of-the-art explainable question answering systems.

Towards Human-centered Explainable AI: User Studies for Model Explanations

This survey shows that XAI is spreading more rapidly in certain application domains, such as recommender systems than in others, but that user evaluations are still rather sparse and incorporate hardly any insights from cognitive or social sciences.

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

An new faithfulness benchmark called Recursive ROAR is proposed, which works by recursively masking allegedly important tokens and then retrain the model, and shows that the faithfulness of importance measures is both model-dependent and task-dependent.

Mediators: Conversational Agents Explaining NLP Model Behavior

Desiderata for Mediators, textbased conversational agents which are capable of explaining the behavior of neural models interactively using natural language, are established from the perspective of natural language processing research.

Identifying Human Strategies for Generating Word-Level Adversarial Examples

A detailed analysis of exactly how humans create adversarial examples against Transformer models is provided, which identifies statistically signif-icant tendencies based on which words humans prefer to select for adversarial replacement as well as where and when words are replaced in an input sequence.

Constructing Natural Language Explanations via Saliency Map Verbalization

The results suggest that saliency map verbalization makes explanations more under-standable and less cognitively challenging to humans than conventional heatmap visualization.

References

SHOWING 1-10 OF 58 REFERENCES

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

A crowdsourcing study where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews, it is observed that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.

Explanation in Artificial Intelligence: Insights from the Social Sciences

What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

It is demonstrated that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios, suggesting a critical need to develop better explainable methods and to deploy human-centered evaluation approaches.

Saliency Maps Generation for Automatic Text Summarization

This paper applies Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained on a text summarization dataset and shows that the saliency maps obtained sometimes capture the real use of the input features by the network, and sometimes do not.

Challenging common interpretability assumptions in feature attribution explanations

It is found that feature attribution explanations provide marginal utility in the authors' task for a human decision maker and in certain cases result in worse decisions due to cognitive and contextual confounders.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

The Who in Explainable AI: How AI Background Shapes Perceptions of AI Explanations

A mixed-methods study of how two different groups of whos—people with and without a background in AI—perceive different types of AI explanations, finding that both groups had unwarranted faith in numbers, to different extents and for different reasons.

Attention is not not Explanation

It is shown that even when reliable adversarial distributions can be found, they don’t perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.

Sanity Checks for Saliency Maps

It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

An new faithfulness benchmark called Recursive ROAR is proposed, which works by recursively masking allegedly important tokens and then retrain the model, and shows that the faithfulness of importance measures is both model-dependent and task-dependent.
...