Corpus ID: 236469595

When and how CNNs generalize to out-of-distribution category-viewpoint combinations

  title={When and how CNNs generalize to out-of-distribution category-viewpoint combinations},
  author={Spandan Madan and Timothy Henry and Jamell Dozier and Helen Ho and Nishchal Bhandari and Tomotake Sasaki and Fr{\'e}do Durand and Hanspeter Pfister and Xavier Boix},
Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie., combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both object category and 3D viewpoint on OOD combinations, and identifying the neural mechanisms that… Expand

Figures from this paper

Three approaches to facilitate DNN generalization to objects in out-of-distribution orientations and illuminations: late-stopping, tuning batch normalization and invariance loss
  • Akira Sakai, Taro Sunagawa, +7 authors Tomotake Sasaki
  • Computer Science
  • ArXiv
  • 2021
It is demonstrated that even though the three approaches focus on different aspects of DNNs, they all tend to lead to the same underlying neural mechanism to enable OoD accuracy gains—individual neurons in the intermediate layers become more selective to a category and also invariant to OoD orientations and illuminations. Expand


Crafting a multi-task CNN for viewpoint estimation
This paper presents a comparison of CNN approaches in a unified setting as well as a detailed analysis of the key factors that impact perfor- mance, and presents a new joint training method with the detection task and demonstrates its benefit. Expand
Improved Deep Learning of Object Category Using Pose Information
  • Jiaping Zhao, L. Itti
  • Computer Science
  • 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
  • 2017
A new convolutional neural network architecture, what/where CNN (2W-CNN), built on a linear-chain feedforward CNN, augmented by hierarchical layers regularized by object poses, is introduced, showing mathematically that 2W- CNN has inherent advantages over AlexNet under the stochastic gradient descent (SGD) optimization procedure. Expand
Revisiting the Importance of Individual Units in CNNs via Ablation
The results show that units with high selectivity play an important role in network classification power at the individual class level and that class selectivity along with other attributes are good predictors of the importance of one unit to individual classes. Expand
A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation
This work studies how Convolutional Neural Networks (CNN) architectures can be adapted to the task of simultaneous object recognition and pose estimation, and investigates and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layer of distributed representations within CNNs represent object pose information and how this contradicts with object category representations. Expand
Why do deep convolutional networks generalize so poorly to small image transformations?
The results indicate that the problem of insuring invariance to small image transformations in neural networks while preserving high accuracy remains unsolved. Expand
Network Dissection: Quantifying Interpretability of Deep Visual Representations
This work uses the proposed Network Dissection method to test the hypothesis that interpretability is an axis-independent property of the representation space, then applies the method to compare the latent representations of various networks when trained to solve different classification problems. Expand
Measuring Invariances in Deep Networks
A number of empirical tests are proposed that directly measure the degree to which these learned features are invariant to different input transformations and find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images and convolutional deep belief networks learn substantially more invariant Features in each layer. Expand
iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning
This work introduces a large-scale synthetic dataset, which is freely and publicly available, and uses it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Expand
On the importance of single directions for generalization
It is found that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance. Expand
Object Detectors Emerge in Deep Scene CNNs
This work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects. Expand