Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation
@inproceedings{Xu2016MultiTaskZA, title={Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation}, author={Xun Xu and Timothy M. Hospedales and Shaogang Gong}, booktitle={European Conference on Computer Vision}, year={2016} }
Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassing the conventional model training requirement of annotated examples for every category. This is achieved by establishing a mapping connecting low-level features and a semantic description of the label space, referred as visual-semantic mapping, on auxiliary data. Re-using the learned mapping to project target videos into an embedding space thus allows novel-classes to be recognised by nearest neighbour inference. However…
88 Citations
Crossmodal Representation Learning for Zero-shot Action Recognition
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
We present a cross-modal Transformer-based frame-work, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually new pipeline by which…
Aligned Dynamic-Preserving Embedding for Zero-Shot Action Recognition
- Computer ScienceIEEE Transactions on Circuits and Systems for Video Technology
- 2020
In this paper, a novel aligned dynamic-preserving embedding (ADPE) model for zero-shot action recognition in a transductive setting is proposed, which can effectively overcome the domain shift problem in zero- shot action recognition.
Spatiotemporal visual-semantic embedding network for zero-shot action recognition
- Computer ScienceJ. Electronic Imaging
- 2019
A spatiotemporal visual-semantic embedding network (STVSEM) for zero-shot action recognition that outperforms the state-of-the-art methods in accuracy and a joint embedding mechanism that explores and exploits the relationships of the visual data and semantic information in an intermediate space to ameliorate the gap between vision and semantics.
Reformulating Zero-shot Action Recognition for Multi-label Actions
- Computer ScienceNeurIPS
- 2021
This work proposes a ZSAR framework which does not rely on nearest neighbor classification, but rather consists of a pairwise scoring function which allows for the prediction of several semantically distinct classes within one video input.
Fairer Evaluation of Zero Shot Action Recognition in Videos
- Computer ScienceVISIGRAPP
- 2021
It is shown that in the field of ZSL for HAR, accuracies for overlapping classes are being boosted by between 5.75% to 51.94% depending on the use of visual and semantic features as a result of this flawed evaluation protocol.
Holistically Associated Transductive Zero-Shot Learning
- Computer ScienceIEEE Transactions on Cognitive and Developmental Systems
- 2022
The proposed model is designed to combat two fundamental problems of ZSL: 1) the representation learning and 2) label assignment of the unseen classes, and it outperforms state-of-the-art methods on these substantially different data sets.
Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks
- Computer ScienceIJCCI
- 2021
This work aims to improve the GAN-based framework by incorporating object-based semantic information related to the class label with three approaches: replacing the class labels with objects, appending objects to theclass, and averaging objects with the class.
Learning Using Privileged Information for Zero-Shot Action Recognition
- Computer ScienceArXiv
- 2022
A simple hallucination network is proposed to implicitly extract object semantics during testing without explicitly extracting objects and a cross-attention module is developed to augment visual feature with the object semantics.
Zero-shot learning for action recognition using synthesized features
- Computer ScienceNeurocomputing
- 2020
Transductive Zero-Shot Learning With Adaptive Structural Embedding
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2018
Two corresponding methods named Adaptive STructural Embedding (ASTE) and Self-PAced Selective Strategy (SPASS) for visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps are presented.
References
SHOWING 1-10 OF 43 REFERENCES
Semantic embedding space for zero-shot action recognition
- Computer Science2015 IEEE International Conference on Image Processing (ICIP)
- 2015
This paper addresses zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels, and demonstrates that a simple self-training and data augmentation strategy can significantly improve the efficacy of this mapping.
Transductive Zero-Shot Action Recognition by Word-Vector Embedding
- Computer ScienceInternational Journal of Computer Vision
- 2016
This study constructs a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data, and achieves the state-of-the-art zero-shot action recognition performance with a simple and efficient pipeline, and without supervised annotation of attributes.
Transductive Multi-View Zero-Shot Learning
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
A novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space that rectifies the projection shift between the auxiliary and target domains, exploits the complementarity of multiple semantic representations, and significantly outperforms existing methods for both zero- shot and N-shot recognition.
Unsupervised Domain Adaptation for Zero-Shot Learning
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A novel ZSL method is proposed based on unsupervised domain adaptation which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem.
Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning
- Computer ScienceACL
- 2015
This paper explores some general properties, both theoretical and empirical, of the cross-space mapping function, and builds on them to propose better methods to estimate it, and achieves large improvements over the state of the art, both in cross-linguistic and cross-modal zero-shot experiments.
Zero-Shot Learning Through Cross-Modal Transfer
- Computer ScienceNIPS
- 2013
This work introduces a model that can recognize objects in images even if no training data is available for the object class, and uses novelty detection methods to differentiate unseen classes from seen classes.
Zero-shot object recognition by semantic manifold distance
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
The semantic manifold structure is used to redefine the distance metric in the semantic embedding space for more effective ZSL, and the proposed new model improves upon and seamlessly unifies various existing ZSL algorithms.
A Unified Perspective on Multi-Domain and Multi-Task Learning
- Computer ScienceICLR
- 2015
This framework unifies MDL and MTL as well as encompassing various classic and recent MTL/MDL algorithms by interpreting them as different ways of constructing semantic descriptors, which provides an alternative pipeline for zero-shot learning (ZSL).
An embarrassingly simple approach to zero-shot learning
- Computer ScienceICML
- 2015
This paper describes a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets.
Improving zero-shot learning by mitigating the hubness problem
- Computer ScienceICLR
- 2015
A simple method is proposed to correct the neighbourhoods of the mapped elements are strongly polluted by hubs, vectors that tend to be near a high proportion of items, pushing their correct labels down the neighbour list, which leads to consistent improvements in realistic zero-shot experiments in the cross-lingual, image labeling and image retrieval domains.