Convolutional Dynamic Alignment Networks for Interpretable Classifications

  title={Convolutional Dynamic Alignment Networks for Interpretable Classifications},
  author={Moritz D Boehle and Mario Fritz and Bernt Schiele},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks 1 (CoDA-Nets), which are performant classifiers with a high degree of inherent interpretability. Their core building blocks are Dynamic Alignment Units (DAUs), which linearly transform their input with weight vectors that dynamically align with task-relevant patterns. As a result, CoDA-Nets model the classification prediction through a series of input-dependent linear transformations, allowing for… 
1 Citations
A comparison of deep saliency map generators on multispectral data in object detection
This work tries to close the gaps by investigating three saliency map generator methods on how their maps differ in the different spectra, and examines how they perform when used for object detection.


Visualizing and Understanding Convolutional Networks
A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Striving for Simplicity: The All Convolutional Net
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Dynamic Filter Networks
The Dynamic Filter Network is introduced, where filters are generated dynamically conditioned on an input, and it is shown that this architecture is a powerful one, with increased flexibility thanks to its adaptive nature, yet without an excessive increase in the number of model parameters.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.
Learning Important Features Through Propagating Activation Differences
DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input, is presented.
Full-Gradient Representation for Neural Network Visualization
This work introduces a new tool for interpreting neural nets, namely full-gradients, which decomposes the neural net response into input sensitivity and per-neuron sensitivity components, and proposes an approximate saliency map representation for convolutional nets dubbed FullGrad, obtained by aggregating the full-gradient components.
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets), and establishes the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks.
This looks like that: deep learning for interpretable image recognition
A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
A high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain is introduced, and behaves similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts.
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.