• Corpus ID: 1836951

Deep Visual Analogy-Making

@inproceedings{Reed2015DeepVA,
  title={Deep Visual Analogy-Making},
  author={Scott E. Reed and Yi Zhang and Y. Zhang and Honglak Lee},
  booktitle={NIPS},
  year={2015}
}
In addition to identifying the content within a single image, relating images and generating related images are critical tasks for image understanding. Recently, deep convolutional networks have yielded breakthroughs in predicting image labels, annotations and captions, but have only just begun to be used for generating high-quality images. In this paper we develop a novel deep network trained end-to-end to perform visual analogy making, which is the task of transforming a query image according… 

Figures and Tables from this paper

Visual attribute transfer through deep image analogy
TLDR
The technique finds semantically-meaningful dense correspondences between two input images by adapting the notion of "image analogy" with features extracted from a Deep Convolutional Neutral Network for matching, and is called deep image analogy.
Semantic Image Analogy with a Conditional Single-Image GAN
TLDR
This work proposes a novel method to model the patch-level correspondence between semantic layout and appearance of a single image by training a single-image GAN that takes semantic labels as conditional input.
Learning to detect visual relations
TLDR
A weakly-supervised approach is proposed which, given pre-trained object detectors, enables us to learn relation detectors using image-level labels only, maintaining a performance close to fully- supervised models.
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
TLDR
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively.
From A to Z: Supervised Transfer of Style and Content Using Deep Neural Network Generators
TLDR
This network is a modified variational autoencoder that supports supervised training of single-image analogies and in-network evaluation of outputs with a structured similarity objective that captures pixel covariances.
Detecting Unseen Visual Relations Using Analogies
TLDR
This work learns a representation of visual relations that combines individual embeddings for subject, object and predicate together with a visual phrase embedding that represents the relation triplet, and demonstrates the benefits of this approach on three challenging datasets.
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning
TLDR
This work meta-learns its analogical contrastive learning model over the same tasks with diverse attributes, and shows that it generalizes to the same visual reasoning problem with unseen attributes.
Leveraging structure in Computer Vision tasks for flexible Deep Learning models
TLDR
This thesis argues that, in contrast to the usual black-box behavior of neural networks, leveraging more structured internal representations is a promising direction for tackling problems, and focuses on two forms of structure, compositional architectures and modularity.
Representation Learning by Learning to Count
TLDR
This paper uses two image transformations in the context of counting: scaling and tiling to train a neural network with a contrastive loss that produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.
Image Analogy with Gaussian Process
TLDR
This work proposes an image analogy method using a Gaussian process that performs significantly better than DNN in environments with small dataset size and proposes novel sampling methods that select salient instances from a given dataset.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Analogy-preserving Semantic Embedding for Visual Object Categorization
TLDR
Analogy-preserving Semantic Embedding (ASE) is proposed to model analogies that reflect the relationships between multiple pairs of classes simultaneously, in the form "p is to q, as r is to s".
Deep Convolutional Inverse Graphics Network
This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene
Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines
TLDR
A low-rank approximation to this interaction tensor that uses a sum of factors, each of which is a three-way outer product, which allows efficient learning of transformations between larger image patches and demonstrates the learning of optimal filter pairs from various synthetic and real image sequences.
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
TLDR
A novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image and allows the model to capture long-term dependencies along a sequence of transformations.
Image analogies
TLDR
This paper describes a new framework for processing images by example, called “image analogies,” based on a simple multi-scale autoregression, inspired primarily by recent results in texture synthesis.
"Mental Rotation" by Optimizing Transforming Distance
TLDR
A trained relational model actively transforms pairs of examples so that they are maximally similar in some feature space yet respect the learned transformational constraints, in order to facilitate a search over a learned space of transformations.
Caffe: Convolutional Architecture for Fast Feature Embedding
TLDR
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Modeling the joint density of two images under a variety of transformations
TLDR
The model is defined as a factored three-way Boltzmann machine, in which hidden variables collaborate to define the joint correlation matrix for image pairs, which makes it possible to efficiently match images that are the same according to a learned measure of similarity.
Transformation Properties of Learned Visual Representations
TLDR
It is demonstrated in a model of rotating NORB objects that employs a latent representation of the non-commutative 3D rotation group SO(3) that is equivalent to a combination of the elementary irreducible representations.
Learning to generate chairs with convolutional neural networks
TLDR
This work trains a generative convolutional neural network which is able to generate images of objects given object type, viewpoint, and color and shows that the network can be used to find correspondences between different chairs from the dataset, outperforming existing approaches on this task.
...
1
2
3
4
...