• Corpus ID: 231718721

Shape or Texture: Understanding Discriminative Features in CNNs

@article{Islam2021ShapeOT,
  title={Shape or Texture: Understanding Discriminative Features in CNNs},
  author={Md. Amirul Islam and Matthew Kowal and Patrick Esser and Sen Jia and Bj{\"o}rn Ommer and Konstantinos G. Derpanis and Neil D. B. Bruce},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.11604}
}
Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a ‘texture bias’: given an image with both texture and shape cues (e.g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture. However, these previous studies conduct experiments on the final classification output of the network, and fail to robustly evaluate… 
Shape Defense Against Adversarial Attacks
TLDR
This work explores how shape bias can be incorporated into CNNs to improve their robustness and shows that edge information can a) benefit other adversarial training methods, b) be even more effective in conjunction with background subtraction, c) be used to defend against poisoning attacks, and d) make CNNs more robust against natural image corruptions.
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
In this paper, we challenge the common assumption that collapsing the spatial dimensions of a 3D (spatial-channel) tensor in a convolutional neural network (CNN) into a vector via global pooling
Position, Padding and Predictions: A Deeper Look at Position Information in CNNs
TLDR
This paper shows that a surprising degree of absolute position information is encoded in commonly used CNNs, and shows that zero padding drives CNNs to encode position information in their internal representations, while a lack of padding precludes position encoding.
Intriguing Properties of Vision Transformers
TLDR
Effective features of ViTs are shown to be due to flexible and dynamic receptive fields possible via self-attention mechanisms, leading to high accuracy rates across a range of classification datasets in both traditional and few-shot learning paradigms.
Graph Jigsaw Learning for Cartoon Face Recognition
TLDR
The proposed GraphJigsaw that constructs jigsaw puzzles at various stages in the classification network and solves the puzzles with the graph convolutional network (GCN) in a progressive manner avoids training the classification model with the deconstructed images that would introduce noisy patterns and are harmful for the final classification.
Robust Contrastive Learning Using Negative Samples with Diminished Semantics
TLDR
This paper develops two methods, texture-based and patch-based augmentations, to generate negative samples, and shows that texture features are indispensable in classifying particular ImageNet classes and especially finer classes.
SegMix: Co-occurrence Driven Mixup for Semantic Segmentation and Adversarial Robustness
TLDR
A strategy for training convolutional neural networks to effectively resolve interference arising from competing hypotheses relating to inter-categorical information throughout the network, based on the notion of feature binding.
Simpler Does It: Generating Semantic Labels with Objectness Guidance
TLDR
This work presents a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model, and proposes an end-to-end multi-task learning strategy, that jointly learns to segment semantics and objectness using the generated pseudo-Labels.
Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations
TLDR
It is demonstrated that simple last layer retraining on large ImageNet-trained models can match or outperform state-of-the-art approaches on spurious correlation benchmarks, but with profoundly lower complexity and computational expenses.
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
  • Rui Qian, Yuxi Li, Weiyao Lin
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
TLDR
A multi-level feature optimization framework to improve the generalization and temporal modeling ability of learned video representations and a simple temporal modeling module from multi- level features to enhance motion pattern learning is proposed.
...
1
2
...

References

SHOWING 1-10 OF 33 REFERENCES
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
TLDR
It is shown that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.
Exploring the Origins and Prevalence of Texture Bias in Convolutional Neural Networks
TLDR
This work finds that, when trained on datasets of images with conflicting shape and texture, the inductive bias of CNNs often favors shape; in general, models learn shape at least as easily as texture.
Describing Textures in the Wild
TLDR
This work identifies a vocabulary of forty-seven texture terms and uses them to describe a large dataset of patterns collected "in the wild", and shows that they both outperform specialized texture descriptors not only on this problem, but also in established material recognition datasets.
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
TLDR
A high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain is introduced, and behaves similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts.
Network Dissection: Quantifying Interpretability of Deep Visual Representations
TLDR
This work uses the proposed Network Dissection method to test the hypothesis that interpretability is an axis-independent property of the representation space, then applies the method to compare the latent representations of various networks when trained to solve different classification problems.
Striving for Simplicity: The All Convolutional Net
TLDR
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Fully Convolutional Networks for Semantic Segmentation
TLDR
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
Visualizing and Understanding Convolutional Networks
TLDR
A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Hypercolumns for object segmentation and fine-grained localization
TLDR
Using hypercolumns as pixel descriptors, this work defines the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and shows results on three fine-grained localization tasks: simultaneous detection and segmentation, and keypoint localization.
...
1
2
3
4
...