Interpreting BERT-based Text Similarity via Activation and Saliency Maps

  title={Interpreting BERT-based Text Similarity via Activation and Saliency Maps},
  author={Itzik Malkiel and Dvir Ginzburg and Oren Barkan and Avi Caciularu and Jonathan Weill and Noam Koenigstein},
  journal={Proceedings of the ACM Web Conference 2022},
Recently, there has been growing interest in the ability of Transformer-based models to produce meaningful embeddings of text with several applications, such as text similarity. Despite significant progress in the field, the explanations for similarity predictions remain challenging, especially in unsupervised settings. In this work, we present an unsupervised technique for explaining paragraph similarities inferred by pre-trained BERT models. By looking at a pair of paragraphs, our technique… 
1 Citations

Figures and Tables from this paper

Similarity Calculation via Passage-Level Event Connection Graph

After measuring text similarity from a passage-level event representation perspective, the calculation acquires superior results than unsupervised methods and even comparable results with some supervised neuron-based methods.



RecoBERT: A Catalog Language Model for Text-Based Recommendations

This work introduces RecoBERT, a BERT-based approach for learning catalog-specialized language models for text-based item recommendations, and suggests novel training and inference procedures for scoring similarities between pairs of items, that don't require item similarity labels.

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

A novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best, and obtains significant improvements over state-of-the-art alternatives.

Sanity Checks for Saliency Maps

It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models

ExBERT provides insights into the meaning of the contextual representations and attention by matching a human-specified input to similar contexts in large annotated datasets, and can quickly replicate findings from literature and extend them to previously not analyzed models.

Neural Attentive Multiview Machines

NAM is a Neural Attentive Multiview machine that learns multiview item representations and similarity by employing a novel attention mechanism that harnesses multiple information sources and automatically quantifies their relevancy with respect to a supervised task.

Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

SDR is introduced, a self-supervised method for document similarity that can be applied to documents of arbitrary length and can be effectively applied to extremely long documents, exceeding the 4, 096 maximal token limit of Longformer.

Maximal Multiverse Learning for Promoting Cross-Task Generalization of Fine-Tuned Language Models

An extensive inter- and intra-dataset evaluation is conducted, showing that the method improves the generalization ability of BERT, sometimes leading to a +9% gain in accuracy.

Rationalizing Neural Predictions

The approach combines two modular components, generator and encoder, which are trained to operate well together and specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.