• Corpus ID: 231718721

Shape or Texture: Understanding Discriminative Features in CNNs

@article{Islam2021ShapeOT,
  title={Shape or Texture: Understanding Discriminative Features in CNNs},
  author={Md. Amirul Islam and Matthew Kowal and Patrick Esser and Sen Jia and Bj{\"o}rn Ommer and Konstantinos G. Derpanis and Neil D. B. Bruce},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.11604}
}
Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a ‘texture bias’: given an image with both texture and shape cues (e.g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture. However, these previous studies conduct experiments on the final classification output of the network, and fail to robustly evaluate… 
Shape Defense Against Adversarial Attacks
TLDR
This work explores how shape bias can be incorporated into CNNs to improve their robustness and shows that edge information can a) benefit other adversarial training methods, b) be even more effective in conjunction with background subtraction, c) be used to defend against poisoning attacks, and d) make CNNs more robust against natural image corruptions.
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
In this paper, we challenge the common assumption that collapsing the spatial dimensions of a 3D (spatial-channel) tensor in a convolutional neural network (CNN) into a vector via global pooling
Position, Padding and Predictions: A Deeper Look at Position Information in CNNs
TLDR
This paper shows that a surprising degree of absolute position information is encoded in commonly used CNNs, and shows that zero padding drives CNNs to encode position information in their internal representations, while a lack of padding precludes position encoding.
Intriguing Properties of Vision Transformers
TLDR
Effective features of ViTs are shown to be due to flexible and dynamic receptive fields possible via self-attention mechanisms, leading to high accuracy rates across a range of classification datasets in both traditional and few-shot learning paradigms.
Graph Jigsaw Learning for Cartoon Face Recognition
TLDR
The proposed GraphJigsaw that constructs jigsaw puzzles at various stages in the classification network and solves the puzzles with the graph convolutional network (GCN) in a progressive manner avoids training the classification model with the deconstructed images that would introduce noisy patterns and are harmful for the final classification.
SegMix: Co-occurrence Driven Mixup for Semantic Segmentation and Adversarial Robustness
TLDR
A strategy for training convolutional neural networks to effectively resolve interference arising from competing hypotheses relating to inter-categorical information throughout the network, based on the notion of feature binding.
Simpler Does It: Generating Semantic Labels with Objectness Guidance
TLDR
This work presents a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model, and proposes an end-to-end multi-task learning strategy, that jointly learns to segment semantics and objectness using the generated pseudo-Labels.
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
TLDR
A multi-level feature optimization framework to improve the generalization and temporal modeling ability of learned video representations and a simple temporal modeling module from multi- level features to enhance motion pattern learning is proposed.
A Game-Theoretic Taxonomy of Visual Concepts in DNNs
TLDR
This paper rethink how a DNN encodes visual concepts of different complexities from a new perspective, i.e. the game-theoretic multi-order interactions between pixels in an image, and provides a new taxonomy of visual concepts, which helps to interpret the encoding of shapes and textures in terms of concept complexities.
Artificial Intelligence-Based Detection, Classification and Prediction/Prognosis in PET Imaging: Towards Radiophenomics
TLDR
This work reviews AI-based techniques, with a special focus on oncological PET and PET/CT imaging, for different detection, classification, and prediction/prognosis tasks, and discusses needed efforts to enable the translation of AI techniques to routine clinical workflows.
...
1
2
...

References

SHOWING 1-10 OF 33 REFERENCES
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
TLDR
It is shown that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.
Exploring the Origins and Prevalence of Texture Bias in Convolutional Neural Networks
TLDR
This work finds that, when trained on datasets of images with conflicting shape and texture, the inductive bias of CNNs often favors shape; in general, models learn shape at least as easily as texture.
Describing Textures in the Wild
TLDR
This work identifies a vocabulary of forty-seven texture terms and uses them to describe a large dataset of patterns collected "in the wild", and shows that they both outperform specialized texture descriptors not only on this problem, but also in established material recognition datasets.
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
TLDR
A high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain is introduced, and behaves similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts.
Network Dissection: Quantifying Interpretability of Deep Visual Representations
TLDR
This work uses the proposed Network Dissection method to test the hypothesis that interpretability is an axis-independent property of the representation space, then applies the method to compare the latent representations of various networks when trained to solve different classification problems.
Striving for Simplicity: The All Convolutional Net
TLDR
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Fully Convolutional Networks for Semantic Segmentation
TLDR
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
Visualizing and Understanding Convolutional Networks
TLDR
A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Hypercolumns for object segmentation and fine-grained localization
TLDR
Using hypercolumns as pixel descriptors, this work defines the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and shows results on three fine-grained localization tasks: simultaneous detection and segmentation, and keypoint localization.
...
1
2
3
4
...