Visual Explanation for Deep Metric Learning

  title={Visual Explanation for Deep Metric Learning},
  author={Sijie Zhu and Taojiannan Yang and Chen Chen},
  journal={IEEE Transactions on Image Processing},
This work explores the visual explanation for deep metric learning and its applications. As an important problem for learning representation, metric learning has attracted much attention recently, while the interpretation of the metric learning model is not as well-studied as classification. To this end, we propose an intuitive idea to show where contributes the most to the overall similarity of two input images by decomposing the final activation. Instead of only providing the overall… 

Figures and Tables from this paper

Adapting Grad-CAM for Embedding Networks

This work proposes an adaptation of the Grad-CAM method for embedding networks, and develops an efficient weight-transfer method to explain decisions for any image without back-propagation.

Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation

It is shown that improvements in metric learning techniques significantly boost the performance regardless of the alignment, and a novel method to estimate the orientation/alignment between a pair of cross-view images with unknown alignment information is proposed.

Visual Kinship Recognition: A Decade in the Making

The public resources and data challenges that enabled and inspired many to hone-in on one or more views of automatic kinship recognition in the visual domain are reviewed and a stronghold for the state of progress is established for the different problems in a consistent manner.

Survey on the Analysis and Modeling of Visual Kinship: A Decade in the Making

The public resources and data challenges that enabled and inspired many to hone-in on the views of automatic kinship recognition in the visual domain are reviewed and a stronghold for the state of progress for the different problems is established.

Complex-valued Iris Recognition Network

This work designs a fully complex-valued neural network that can better capture the multi-scale, multi-resolution, and multi-orientation phase and amplitude features of the iris texture and exploits visualization schemes to convey how thecomplex-valued network, when in comparison to standard real-valued networks, extract fundamentally different features from the iri texture.


The theory of fair credit assignment provides a unique axiomatic solution that generalizes several existing recommendation- and metric-explainability techniques in the literature and yields methods that are more “faithful” to the underlying model and better satisfy efficiency axioms across several visual similarity methods.

Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning

The theory of fair credit assignment provides a unique axiomatic solution that generalizes several existing recommendation and metric-explainability techniques in the literature and derives methods that sidestep these shortcomings and naturally handle counterfactual information.

X-MIR: EXplainable Medical Image Retrieval

This work evaluated three different saliency algorithms, which were either occlusion- based, attention-based, or relied on a form of activation mapping, and develops quantitative evaluation metrics that allow us to go beyond simple qualitative comparisons of the different Saliency algorithms.

Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations

A new interpretation method is proposed that explains the image similarity models by salience maps and attribute words using the proposed erasing model and can be applied to evidential learning cases, e.g. finding the most characteristic attributes in a set of face images.

Attention-based Dynamic Subspace Learners

This work presents a Dynamic Subspace Learners to dynamically exploit multiple learners by removing the need of knowing apriori the number of learners and aggregating new subspace learners during training.



Sanity Checks for Saliency Maps

It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.

Interpreting Deep Visual Representations via Network Dissection

Network Dissection is described, a method that interprets networks by providing meaningful labels to their individual units that reveals that deep representations are more transparent and interpretable than they would be under a random equivalently powerful basis.

Learning Deep Features for Discriminative Localization

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability

Interpretable Basis Decomposition for Visual Explanation

A new framework called Interpretable Basis Decomposition for providing visual explanations for classification networks is proposed, decomposing the neural activations of the input image into semantically interpretable components pre-trained from a large concept corpus.

Visualizing Deep Similarity Networks

The visualization shows how similarity networks that are fine-tuned learn to focus on different features and generalize the approach to embedding networks that use different pooling strategies and provide a simple mechanism to support image similarity searches on objects or sub-regions in the query image.

Understanding deep image representations by inverting them

Image representations, from SIFT and Bag of Visual Words to Convolutional Neural Networks (CNNs), are a crucial component of almost any image understanding system. Nevertheless, our understanding of

Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks

This paper proposes Grad-CAM++, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the class label under consideration, to provide better visual explanations of CNN model predictions.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning

A General Pair Weighting framework is established, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions.

Interpretable Explanations of Black Boxes by Meaningful Perturbation

A general framework for learning different kinds of explanations for any black box algorithm is proposed and the framework to find the part of an image most responsible for a classifier decision is specialised.