Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

  title={Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation},
  author={Xun Xu and Timothy M. Hospedales and Shaogang Gong},
  booktitle={European Conference on Computer Vision},
Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassing the conventional model training requirement of annotated examples for every category. This is achieved by establishing a mapping connecting low-level features and a semantic description of the label space, referred as visual-semantic mapping, on auxiliary data. Re-using the learned mapping to project target videos into an embedding space thus allows novel-classes to be recognised by nearest neighbour inference. However… 

Crossmodal Representation Learning for Zero-shot Action Recognition

We present a cross-modal Transformer-based frame-work, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually new pipeline by which

Aligned Dynamic-Preserving Embedding for Zero-Shot Action Recognition

In this paper, a novel aligned dynamic-preserving embedding (ADPE) model for zero-shot action recognition in a transductive setting is proposed, which can effectively overcome the domain shift problem in zero- shot action recognition.

Spatiotemporal visual-semantic embedding network for zero-shot action recognition

A spatiotemporal visual-semantic embedding network (STVSEM) for zero-shot action recognition that outperforms the state-of-the-art methods in accuracy and a joint embedding mechanism that explores and exploits the relationships of the visual data and semantic information in an intermediate space to ameliorate the gap between vision and semantics.

Reformulating Zero-shot Action Recognition for Multi-label Actions

This work proposes a ZSAR framework which does not rely on nearest neighbor classification, but rather consists of a pairwise scoring function which allows for the prediction of several semantically distinct classes within one video input.

Fairer Evaluation of Zero Shot Action Recognition in Videos

It is shown that in the field of ZSL for HAR, accuracies for overlapping classes are being boosted by between 5.75% to 51.94% depending on the use of visual and semantic features as a result of this flawed evaluation protocol.

Holistically Associated Transductive Zero-Shot Learning

The proposed model is designed to combat two fundamental problems of ZSL: 1) the representation learning and 2) label assignment of the unseen classes, and it outperforms state-of-the-art methods on these substantially different data sets.

Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks

This work aims to improve the GAN-based framework by incorporating object-based semantic information related to the class label with three approaches: replacing the class labels with objects, appending objects to theclass, and averaging objects with the class.

Learning Using Privileged Information for Zero-Shot Action Recognition

A simple hallucination network is proposed to implicitly extract object semantics during testing without explicitly extracting objects and a cross-attention module is developed to augment visual feature with the object semantics.

Transductive Zero-Shot Learning With Adaptive Structural Embedding

Two corresponding methods named Adaptive STructural Embedding (ASTE) and Self-PAced Selective Strategy (SPASS) for visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps are presented.



Semantic embedding space for zero-shot action recognition

This paper addresses zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels, and demonstrates that a simple self-training and data augmentation strategy can significantly improve the efficacy of this mapping.

Transductive Zero-Shot Action Recognition by Word-Vector Embedding

This study constructs a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data, and achieves the state-of-the-art zero-shot action recognition performance with a simple and efficient pipeline, and without supervised annotation of attributes.

Transductive Multi-View Zero-Shot Learning

A novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space that rectifies the projection shift between the auxiliary and target domains, exploits the complementarity of multiple semantic representations, and significantly outperforms existing methods for both zero- shot and N-shot recognition.

Unsupervised Domain Adaptation for Zero-Shot Learning

A novel ZSL method is proposed based on unsupervised domain adaptation which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem.

Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning

This paper explores some general properties, both theoretical and empirical, of the cross-space mapping function, and builds on them to propose better methods to estimate it, and achieves large improvements over the state of the art, both in cross-linguistic and cross-modal zero-shot experiments.

Zero-Shot Learning Through Cross-Modal Transfer

This work introduces a model that can recognize objects in images even if no training data is available for the object class, and uses novelty detection methods to differentiate unseen classes from seen classes.

Zero-shot object recognition by semantic manifold distance

The semantic manifold structure is used to redefine the distance metric in the semantic embedding space for more effective ZSL, and the proposed new model improves upon and seamlessly unifies various existing ZSL algorithms.

A Unified Perspective on Multi-Domain and Multi-Task Learning

This framework unifies MDL and MTL as well as encompassing various classic and recent MTL/MDL algorithms by interpreting them as different ways of constructing semantic descriptors, which provides an alternative pipeline for zero-shot learning (ZSL).

An embarrassingly simple approach to zero-shot learning

This paper describes a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets.

Improving zero-shot learning by mitigating the hubness problem

A simple method is proposed to correct the neighbourhoods of the mapped elements are strongly polluted by hubs, vectors that tend to be near a high proportion of items, pushing their correct labels down the neighbour list, which leads to consistent improvements in realistic zero-shot experiments in the cross-lingual, image labeling and image retrieval domains.