• Corpus ID: 238408254

Interactively Generating Explanations for Transformer Language Models

@article{Schramowski2021InteractivelyGE,
  title={Interactively Generating Explanations for Transformer Language Models},
  author={Patrick Schramowski and Felix Friedrich and Christopher Tauchmann and Kristian Kersting},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.02058}
}
Transformer language models are state-of-the-art in a multitude of NLP tasks. Despite these successes, their opaqueness remains problematic. Recent methods aiming to provide interpretability and explainability to black-box models primarily focus on post-hoc explanations of (sometimes spurious) input-output correlations. Instead, we emphasize using prototype networks directly incorporated into the model architecture and hence explain the reasoning process behind the network’s decisions. Moreover… 

Figures and Tables from this paper

Concept-level Debugging of Part-Prototype Networks

ProtoPDebug is proposed, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model’s explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is tuned to align with this supervision.

A Review on Language Models as Knowledge Bases

This paper presents a set of aspects that it is deemed an LM should have to fully act as a KB, and reviews the recent literature with respect to those aspects.

References

SHOWING 1-10 OF 38 REFERENCES

This looks like that: deep learning for interpretable image recognition

A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions

This work creates a novel network architecture for deep learning that naturally explains its own reasoning for each prediction, and the explanations are loyal to what the network actually computes.

Making deep neural networks right for the right scientific reasons by interacting with their explanations

The novel learning setting of explanatory interactive learning is introduced and its benefits on a plant phenotyping research task are illustrated and it is demonstrated that explanatory interactiveLearning can help to avoid Clever Hans moments in machine learning.

What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models

The behaviour of non-target heads is examined, that is, the output of heads when given input that belongs to a different task than the one they were trained for, which suggests that multi-task training leads to non-trivial extrapolation of skills, which can be harnessed for interpretability and generalization.

Evaluating Saliency Methods for Neural Language Models

Through the evaluation, various ways saliency methods could yield interpretations of low quality are identified, and it is recommended that future work deploying such methods to neural language models should carefully validate their interpretations before drawing insights.

An Empirical Comparison of Instance Attribution Methods for NLP

It is found that simple retrieval methods yield training instances that differ from those identified via gradient-based methods (such as IFs), but that nonetheless exhibit desirable characteristics similar to more complex attribution methods.

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, and carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values are provided.