Corpus ID: 236428541

What Remains of Visual Semantic Embeddings

  title={What Remains of Visual Semantic Embeddings},
  author={Yue Jiao and Jonathon S. Hare and A. Pr{\"u}gel-Bennett},
Zero shot learning (ZSL) has seen a surge in interest over the decade for its tight links with the mechanism making young children recognize novel objects. Although different paradigms of visual semantic embedding models are designed to align visual features and distributed word representations, it is unclear to what extent current ZSL models encode semantic information from distributed word representations. In this work, we introduce the split of tieredImageNet to the ZSL task, in order to… Expand

Figures and Tables from this paper


DeViSE: A Deep Visual-Semantic Embedding Model
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Expand
Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs
This paper builds upon the recently introduced Graph Convolutional Network (GCN) and proposes an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers, and shows that it is robust to noise in the KG. Expand
Probing Text Models for Common Ground with Visual Representations
It is found that representations from models trained on purely textual data, such as BERT, can be nontrivially mapped to those of a vision model, and the context surrounding objects in sentences greatly impacts performance. Expand
SCAN: Learning Hierarchical Compositional Visual Concepts
SCAN (Symbol-Concept Association Network), a new framework for learning such abstractions in the visual domain that allows for traversal and manipulation of the implicit hierarchy of visual concepts through symbolic instructions and learnt logical recombination operations, is described. Expand
Zero-Shot Learning Through Cross-Modal Transfer
This work introduces a model that can recognize objects in images even if no training data is available for the object class, and uses novelty detection methods to differentiate unseen classes from seen classes. Expand
Locality and compositionality in zero-shot learning
The results of these experiments show how locality, in terms of small parts of the input, and compositionality, i.e. how well can the learned representations be expressed as a function of a smaller vocabulary, are both deeply related to generalization and motivate the focus on more local-aware models in future research directions for representation learning. Expand
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
This work proposes a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders, and align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. Expand
Visual and semantic similarity in ImageNet
The insights gained from analysis enable building a novel distance function between images assessing whether they are from the same basic-level category, which goes beyond direct visual distance as it also exploits semantic similarity measured through ImageNet. Expand
Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
A Hyperbolic Visual Embedding Learning Network for zero-shot recognition that is more robust because the embedding feature in hyperbolic space better represents class hierarchy and thereby avoid misleading resulted from unrelated siblings. Expand
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand