Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition

@article{Liu2019HierarchicallyLV,
  title={Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition},
  author={Yang Liu and Zhaoyang Lu and Jing Li and Tao Yang},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2019},
  volume={29},
  pages={2416-2430}
}
  • Yang Liu, Zhaoyang Lu, Tao Yang
  • Published 3 September 2018
  • Computer Science
  • IEEE Transactions on Circuits and Systems for Video Technology
Recognizing human actions from varied views is challenging due to huge appearance variations in different views. The key to this problem is to learn discriminant view-invariant representations generalizing well across views. In this paper, we address this problem by learning view-invariant representations hierarchically using a novel method, referred to as joint sparse representation and distribution adaptation. To obtain robust and informative feature representations, we first incorporate a… 
SSM-Based Joint Dictionary Learning for Cross-View Action Recognition
TLDR
A novel approach, referred as SSM-based joint dictionary learning (SJDL), which can obtain more discriminative features for cross-view action recognition and is demonstrated by experimental results on the IXMAS dataset.
Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition
TLDR
A transfer learning-based framework called transferable dictionary learning and view adaptation (TDVA) model for multi-view human action recognition that progressively bridges the distribution gap among actions from various views by these two phases.
Learning Representations From Skeletal Self-Similarities for Cross-View Action Recognition
TLDR
This paper addresses the problem of large variations of action representations as actions are captured from totally different viewpoints by learning view-invariant representations from skeletal self-similarities of varying scales with a very light multi-stream neural network (MSNN).
Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition
TLDR
This paper proposes a novel method, named Deep Image-to-Video Adaptation and Fusion Networks (DIVAFN), to enhance action recognition in videos by transferring knowledge from images using video keyframes as a bridge and outperforms some state-of-the-art domain adaptation and action recognition methods.
SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
TLDR
This work presents an unsupervised variational approach to the compositional structure of any given scene, and learns to infer two sets of latent representations from RGB video input, which allows the model, SIMONe, to represent object attributes in an allocentric manner which does not depend on viewpoint.
TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning
TLDR
A novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL), which jointly models the inter- Snippet and intra-snippet temporal dependencies for temporal representation learning with a hybrid graph contrastive learning strategy is proposed.
Synthetic Humans for Action Recognition from Unseen Viewpoints
TLDR
This work makes use of the recent advances in monocular 3D human body reconstruction from real action sequences to automatically render synthetic training videos for the action labels, and introduces a new data generation methodology that allows training of spatio-temporal CNNs for action classification.
Audio-Visual Contrastive Learning for Self-supervised Action Recognition
TLDR
This paper presents an end-to-end self-supervised framework named Audio-Visual Contrastive Learning (AVCL), to learn discriminative audio- visual representations for action recognition and designs an attention based multi-modal fusion module (AMFM) to fuse audio and visual modalities.
Semantics-Aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition
TLDR
This paper proposes a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors.
Unsupervised Domain Adaptation via Importance Sampling
TLDR
This paper presents an importance sampling method for domain adaptation (ISDA), to measure sample contributions according to their “informative” levels, and shows that this method outperforms state-of-the-art methods under both the standard and partial domain adaptation settings.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
Deeply Learned View-Invariant Features for Cross-View Action Recognition
TLDR
A novel sample-affinity matrix is introduced in learning shared features, which accurately balances information transfer within the samples from multiple views and limits the transfer across samples, and which outperform the state-of-the-art approaches.
Cross-View Action Recognition via Transferable Dictionary Learning
TLDR
Two effective approaches to learn dictionaries for robust action recognition across views are presented and it is demonstrated that the proposed approach outperforms recently developed approaches for cross-view action recognition.
Cross-View Action Recognition via a Transferable Dictionary Pair
TLDR
This work presents a method for viewinvariant action recognition based on sparse representations using a transferable dictionary pair, and extends the approach to transferring an action model learned from multiple source views to one target view.
Learning View-Invariant Sparse Representations for Cross-View Action Recognition
TLDR
This approach represents videos in each view using both the corresponding view-specific dictionary and the common dictionary, which has the capability to represent actions from unseen views, and makes the approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view.
Cross-view action recognition via view knowledge transfer
TLDR
A novel approach to recognizing human actions from different views by view knowledge transfer that can transfer a BoVW action model into a bag-of-bilingual-words (BoBW) model, which is more discriminative in the presence of view changes.
Multitask Linear Discriminant Analysis for View Invariant Action Recognition
TLDR
This work proposes multitask linear discriminant analysis (LDA), a novel multitask learning framework for multiview action recognition that allows for the sharing of discriminative SSM features among different views (i.e., tasks) by choosing an appropriate class indicator matrix.
View-Invariant Action Recognition Using Latent Kernelized Structural SVM
TLDR
A novel learning algorithm is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables and combines the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel.
3D Action Recognition from Novel Viewpoints
  • Hossein Rahmani, A. Mian
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
The proposed human pose representation model is able to generalize to real depth images of unseen poses without the need for re-training or fine-tuning and dramatically outperforms existing state-of-the-art in action recognition.
Cross-View Action Modeling, Learning, and Recognition
TLDR
A novel multiview spatio-temporal and-or graph (MST-AOG) representation for cross-view action recognition, which takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, but the recognition does not need 3D information and is based on 2D video input.
Discriminative virtual views for cross-view action recognition
TLDR
The proposed approach achieves improved or competitive performance relative to existing methods when instance correspondences or target labels are available, and it goes beyond the capabilities of these methods by providing some level of discrimination even when neither correspondences nor target labels exist.
...
...