• Corpus ID: 195767265

Visual Space Optimization for Zero-shot Learning

  title={Visual Space Optimization for Zero-shot Learning},
  author={Xinsheng Wang and Shanmin Pang and Jihua Zhu and Zhongyu Li and Zhiqiang Tian and Yaochen Li},
Zero-shot learning, which aims to recognize new categories that are not included in the training set, has gained popularity owing to its potential ability in the real-word applications. Zero-shot learning models rely on learning an embedding space, where both semantic descriptions of classes and visual features of instances can be embedded for nearest neighbor search. Recently, most of the existing works consider the visual space formulated by deep visual features as an ideal choice of the… 

Trading-off Information Modalities in Zero-shot Classification

Two different formulations are proposed that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting and demonstrate that this approach competes favorably with the state of the art on both the standard and generalized settings.

Semantics-Guided Contrastive Network for Zero-Shot Object detection

A novel Semantics-Guided Contrastive Network for ZSD (ContrastZSD), a detection framework that first brings the contrastive learning paradigm into the realm of ZSD and under the guidance of those explicit semantic supervision, the model can learn more knowledge about unseen categories to avoid over-fitting to the seen concepts.

Domain segmentation and adjustment for generalized zero-shot learning

This paper proposes to realize the generalized zero-shot recognition in different domains and proposes a threshold and probabilistic distribution joint method to segment the testing instances into seen, unseen and uncertain domains, which can avoid the effect of the seen (unseen) classes.

Look, Listen and Infer

For the first time, a Look, Listen and Infer Network (LLINet) is proposed to learn a zero- shot model that can infer the relations of visual scenes and sounds from novel categories never appeared before, indicating that zero-shot learning for visual Scenes and sounds is feasible.

Locality and compositionality in zero-shot learning

The results of these experiments show how locality, in terms of small parts of the input, and compositionality, i.e. how well can the learned representations be expressed as a function of a smaller vocabulary, are both deeply related to generalization and motivate the focus on more local-aware models in future research directions for representation learning.

Learning to Infer Unseen Attribute-Object Compositions

A graph-based model is proposed that can flexibly recognize both single and multi-attribute-object compositions, and maps the visual features of images and the attribute-object category labels represented by word embedding vectors into a latent space.

0001Learning to Infer Unseen Attribute-Object Compositions.

A graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions, and maps the visual features of images and the attribute-object category labels represented by word embedding vectors into a latent space.



Learning a Deep Embedding Model for Zero-Shot Learning

  • Li ZhangT. XiangS. Gong
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
This paper proposes to use the visual space as the embedding space instead of embedding into a semantic space or an intermediate space, and argues that in this space, the subsequent nearest neighbour search would suffer much less from the hubness problem and thus become more effective.

Zero-Shot Learning Through Cross-Modal Transfer

This work introduces a model that can recognize objects in images even if no training data is available for the object class, and uses novelty detection methods to differentiate unseen classes from seen classes.

Synthesized Classifiers for Zero-Shot Learning

This work introduces a set of "phantom" object classes whose coordinates live in both the semantic space and the model space and demonstrates superior accuracy of this approach over the state of the art on four benchmark datasets for zero-shot learning.

DeViSE: A Deep Visual-Semantic Embedding Model

This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.

Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition

This work proposes a coupled dictionary learning approach to align the visual-semantic structures using the class prototypes, where the discriminative information lying in the visual space is utilized to improve the less discrim inative semantic space.

Semantic Autoencoder for Zero-Shot Learning

This work presents a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE), which outperforms significantly the existing ZSL models with the additional benefit of lower computational cost and beats the state-of-the-art when the SAE is applied to supervised clustering problem.

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

A novel embedding algorithm named Unseen Visual Data Synthesis (UVDS) that projects semantic features to the high-dimensional visual feature space and introduces a latent embedding space which aims to reconcile the structural difference between the visual and semantic spaces, meanwhile preserve the local structure.

Zero-Shot Learning by Convex Combination of Semantic Embeddings

A simple method for constructing an image embedding system from any existing image classifier and a semantic word embedding model, which contains the $\n$ class labels in its vocabulary is proposed, which outperforms state of the art methods on the ImageNet zero-shot learning task.

Marginalized Latent Semantic Encoder for Zero-Shot Learning

  • Zhengming DingHongfu Liu
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
This paper designs a Marginalized Latent Semantic Encoder (MLSE), which is learned on the augmented seen visual features and the latent semantic representation, and whose latent semantics are discovered under an adaptive graph reconstruction scheme based on the provided semantics.

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

This work proposes a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders, and align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes.