Understanding deep image representations by inverting them

  title={Understanding deep image representations by inverting them},
  author={Aravindh Mahendran and Andrea Vedaldi},
  journal={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Aravindh Mahendran, A. Vedaldi
  • Published 26 November 2014
  • Computer Science
  • 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Image representations, from SIFT and Bag of Visual Words to Convolutional Neural Networks (CNNs), are a crucial component of almost any image understanding system. [] Key Method We show that this method can invert representations such as HOG more accurately than recent alternatives while being applicable to CNNs too. We then use this technique to study the inverse of recent state-of-the-art CNN image representations for the first time. Among our findings, we show that several layers in CNNs retain…

Inverting Visual Representations with Convolutional Networks

  • A. DosovitskiyT. Brox
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This work proposes a new approach to study image representations by inverting them with an up-convolutional neural network, and applies this method to shallow representations (HOG, SIFT, LBP), as well as to deep networks.

Inverting face embeddings with convolutional neural networks

This work uses neural networks to effectively invert low-dimensional face embeddings while producing realistically looking consistent images and demonstrates that a gradient ascent style approaches can be used to reproduce consistent images, with a help of a guiding image.

Patch Correspondences for Interpreting Pixel-level CNNs

CompNN is presented, a simple approach to visually interpreting distributed representations learned by a convolutional neural network for pixel-level tasks (e.g., image synthesis and segmentation) by reconstructing both a CNN's input and output image by copy-pasting corresponding patches from the training set with similar feature embeddings.

Randomness in Deconvolutional Networks for Visual Representation

Compared with the image inversion on pre-trained CNN, the training converges faster and the yielding network exhibits higher quality for image reconstruction, which indicates there is rich information encoded in the random features.

Inverting Convolutional Networks with Convolutional Networks

This work proposes a new approach to study deep image representations by inverting them with an up-convolutional neural network, and application of this method to a deep network trained on ImageNet provides numerous insights into the properties of the feature representation.

Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

This paper proposes a novel, simple yet effective activation scheme called concatenated ReLU (CRelu) and theoretically analyze its reconstruction property in CNNs and integrates CRelu into several state-of-the-art CNN architectures and demonstrates improvement in their recognition performance on CIFAR-10/100 and ImageNet datasets with fewer trainable parameters.

Visualizing and Understanding Deep Texture Representations

A systematic evaluation of recent CNN-based texture descriptors for recognition and a technique to visualize pre-images is proposed, providing a means for understanding categorical properties that are captured by these representations.

Understanding Deep Features with Computer-Generated Imagery

This work introduces an approach for analyzing the variation of features generated by convolutional neural networks trained on large image datasets with respect to scene factors that occur in natural images, and quantifies their relative importance in the CNN responses and visualize them using principal component analysis.

The Essence of Pose

This work explores inversion techniques similar to those in [12, 13, 3] to invert the feature descriptor learned by PoseNet, a CNN by the Stanford Computer Vision and Geometry laboratory trained to understand the 3D pose of objects.

Interpreting Deep Visual Representations via Network Dissection

Network Dissection is described, a method that interprets networks by providing meaningful labels to their individual units that reveals that deep representations are more transparent and interpretable than they would be under a random equivalently powerful basis.



Visualizing and Understanding Convolutional Networks

A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets), and establishes the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks.

Fisher Kernels on Visual Vocabularies for Image Categorization

  • F. PerronninC. Dance
  • Computer Science
    2007 IEEE Conference on Computer Vision and Pattern Recognition
  • 2007
This work shows that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms, and proposes to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images.

Visualizing Higher-Layer Features of a Deep Network

This paper contrast and compare several techniques applied on Stacked Denoising Autoencoders and Deep Belief Networks, trained on several vision datasets, and shows that good qualitative interpretations of high level features represented by such models are possible at the unit level.

Exploring the representation capabilities of the HOG descriptor

The metameric class of moments of HOG is introduced which allows for a target image to be morphed into an impostor image sharing the HOG representation of a source image while retaining the initial visual appearance.

Reconstructing an image from its local descriptors

This paper shows that an image can be approximately reconstructed based on the output of a blackbox local description software such as those classically used for image indexing, and raises critical issues of privacy and rights when local descriptors of photos or videos are given away for indexing and search purpose.

Learning Deep Features for Scene Recognition using Places Database

A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.

Intriguing properties of neural networks

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Supervised translation-invariant sparse coding

Experiments show that the supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases and implying its great potential in handling large scale datasets in real applications.