Corpus ID: 551912

Visalogy: Answering Visual Analogy Questions

@inproceedings{Sadeghi2015VisalogyAV,
  title={Visalogy: Answering Visual Analogy Questions},
  author={Fereshteh Sadeghi and C. L. Zitnick and Ali Farhadi},
  booktitle={NIPS},
  year={2015}
}
In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together… Expand
Detecting Unseen Visual Relations Using Analogies
TLDR
This work learns a representation of visual relations that combines individual embeddings for subject, object and predicate together with a visual phrase embedding that represents the relation triplet, and demonstrates the benefits of this approach on three challenging datasets. Expand
Learning to detect visual relations
TLDR
A weakly-supervised approach is proposed which, given pre-trained object detectors, enables us to learn relation detectors using image-level labels only, maintaining a performance close to fully- supervised models. Expand
From A to Z: Supervised Transfer of Style and Content Using Deep Neural Network Generators
TLDR
This network is a modified variational autoencoder that supports supervised training of single-image analogies and in-network evaluation of outputs with a structured similarity objective that captures pixel covariances. Expand
Visual analogy: Deep learning versus compositional models
TLDR
This work compares human performance on visual analogies created using images of familiar three-dimensional objects (cars and their subregions) with the performance of alternative computational models and generates qualitative performance similar to that of human reasoners. Expand
Quartet-net Learning for Visual Instance Retrieval
TLDR
This paper proposes quartet-net learning to improve the discriminative power of CNN features for instance retrieval by adopting a double-margin contrastive loss with a dynamic margin tuning strategy to train the network which leads to more robust performance. Expand
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
TLDR
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. Expand
Seeing the Meaning: Vision Meets Semantics in Solving Pictorial Analogy Problems
TLDR
This first effort to model the solution of meaningful four-term visual analogies, by combining a machine-vision model that can classify pixel-level images into object categories with a cognitive model that takes semantic representations of words as input and identifies semantic relations instantiated by a word pair, provides a proof of concept that a comprehensive model can solve semantically-rich analogies from pixel- level inputs. Expand
Neural Scene De-rendering
TLDR
This work proposes a new approach to learn an interpretable distributed representation of scenes, using a deterministic rendering function as the decoder and a object proposal based encoder that is trained by minimizing both the supervised prediction and the unsupervised reconstruction errors. Expand
Contextual Visual Similarity
TLDR
This work examines the concept of contextual visual similarity in the application domain of image search, and proposes a contextualized similarity search criteria that requires three images to be provided. Expand
Computer Vision and Natural Language Processing
TLDR
This survey provides a comprehensive introduction of the integration of computer vision and natural language processing in multimedia and robotics applications with more than 200 key references and presents a unified view for the field and proposes possible future directions. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 31 REFERENCES
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we proposeExpand
Visual Madlibs: Fill in the Blank Description Generation and Question Answering
TLDR
A new dataset consisting of 360,001 focused natural language descriptions for 10,738 images is introduced and its applicability to two new description generation tasks: focused description generation, and multiple-choice question-answering for images is demonstrated. Expand
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural languageExpand
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
TLDR
This work proposes a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework. Expand
Analogy-preserving Semantic Embedding for Visual Object Categorization
TLDR
Analogy-preserving Semantic Embedding (ASE) is proposed to model analogies that reflect the relationships between multiple pairs of classes simultaneously, in the form "p is to q, as r is to s". Expand
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
TLDR
This work introduces the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder, and shows that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic. Expand
VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases
TLDR
This work introduces the problem of visual verification of relation phrases and developed a Visual Knowledge Extraction system called VisKE, which has been used to not only enrich existing textual knowledge bases by improving their recall, but also augment open-domain question-answer reasoning. Expand
Linguistic Regularities in Sparse and Explicit Word Representations
TLDR
It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations. Expand
Relative attributes
TLDR
This work proposes a generative model over the joint space of attribute ranking outputs, and proposes a novel form of zero-shot learning in which the supervisor relates the unseen object category to previously seen objects via attributes (for example, ‘bears are furrier than giraffes’). Expand
Corpus-based Learning of Analogies and Semantic Relations
TLDR
An algorithm for learning from unlabeled text that can solve verbal analogy questions of the kind found in the SAT college entrance exam and is state-of-the-art for both verbal analogies and noun-modifier relations is presented. Expand
...
1
2
3
4
...