The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

  title={The Unreasonable Effectiveness of Deep Features as a Perceptual Metric},
  author={Richard Zhang and Phillip Isola and Alexei A. Efros and Eli Shechtman and Oliver Wang},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. [] Key Result More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

Figures and Tables from this paper

On the surprising tradeoff between ImageNet accuracy and perceptual similarity

A large-scale study examines the ImageNet accuracy/Perceptual Score relationship on varying the depth, width, number of training steps, weight decay, label smoothing, and dropout, and finds shallow ResNets, trained for less than 5 epochs only on ImageNet, whose emergent Perceptual score matches the prior best networks trained directly on supervised human perceptual judgements.

Why Are Deep Representations Good Perceptual Quality Features?

It is demonstrated that the pre-trained CNN features which receive higher scores are better at predicting human quality judgment, and the possibility of using the method to select deep features to form a new loss function, which improves the image reconstruction quality for the well-known single-image super-resolution problem.

Do better ImageNet classifiers assess perceptual similarity better?

A large-scale empirical study to assess how well ImageNet classifiers perform on perceptual similarity finds a Pareto frontier between accuracies and Perceptual Score in the mid-to-high accuracy regime.

Totally Looks Like - How Humans Compare, Compared to Machines

A new dataset dubbed Totally-Looks-Like (TLL) is introduced, which contains images paired by humans as being visually similar, and it is shown that machine-extracted representations perform very poorly in terms of reproducing the matching selected by humans.

Understanding and Simplifying Perceptual Distances

  • D. AmirYair Weiss
  • Computer Science, Environmental Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
This paper uses the tool of infinite CNNs to derive an analytical form for perceptual similarity in such CNNs, and proves that the perceptual distance between two images is equivalent to the maximum mean discrepancy (MMD) distance between local distributions of small patches in the two images.

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

Evidence of "perceptual convexity" is found by showing that convex combinations of similar-looking images retain appearance, and that discrete geodesics yield meaningful frame interpolation and texture morphing, all without explicit correspondence.

Perceptnet: A Human Visual System Inspired Neural Network For Estimating Perceptual Distance

This work presents PerceptNet, a convolutional neural network where the architecture has been chosen to reflect the structure and various stages in the human visual system, and shows that including a nonlinearity inspired by the humanVisual system in classical deep neural networks architectures can increase their ability to judge perceptual similarity.

Using deep perceptual embeddings as a quality metric for synthetic imagery

This work proposes a new evaluation metric that both correlates with perceptual quality and operates at the data level so that it can function on datasets from any domain and demonstrates efficacy of this metric on the CIFAR-10 dataset.

Enriching ImageNet with Human Similarity Judgments and Psychological Embeddings

  • Brett D. RoadsB. Love
  • Computer Science, Psychology
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
A publicly-available dataset that embodies the task-general capabilities of human perception and reasoning, and uses the similarity ratings and the embedding space to evaluate how well several popular models conform to human similarity judgments.

Hierarchical Auto-Regressive Model for Image Compression Incorporating Object Saliency and a Deep Perceptual Loss

This work empirically demonstrate that the popularly used evaluations metrics such as MS-SSIM and PSNR are inadequate for judging the performance of deep learned image compression techniques as they do not align well with human perceptual similarity.



NIMA: Neural Image Assessment

The proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks and can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline.

Learned perceptual image enhancement

This paper shows that adding a learned no-reference image quality metric to the loss can significantly improve enhancement operators and can be effective for tuning a variety of operators such as local tone mapping and dehazing.

Generating Images with Perceptual Similarity Metrics based on Deep Networks

A class of loss functions, which are called deep perceptual similarity metrics (DeePSiM), are proposed that compute distances between image features extracted by deep neural networks and better reflects perceptually similarity of images and thus leads to better results.

Context Encoders: Feature Learning by Inpainting

It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.

DeepSim: Deep similarity for image quality assessment

Data-dependent Initializations of Convolutional Neural Networks

This work presents a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients.

Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework

  • Jongyoo KimSanghoon Lee
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
A novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases, which achieves the state-of-the-art prediction accuracy among FR- IQA models.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Unsupervised Visual Representation Learning by Context Prediction

It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset.

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced.