Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition

@article{Liu2019HierarchicallyLV,
  title={Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition},
  author={Yang Liu and Zhaoyang Lu and Jing Li and Tao Yang},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2019},
  volume={29},
  pages={2416-2430}
}
Recognizing human actions from varied views is challenging due to huge appearance variations in different views. The key to this problem is to learn discriminant view-invariant representations generalizing well across views. In this paper, we address this problem by learning view-invariant representations hierarchically using a novel method, referred to as joint sparse representation and distribution adaptation. To obtain robust and informative feature representations, we first incorporate a… Expand
SSM-Based Joint Dictionary Learning for Cross-View Action Recognition
TLDR
A novel approach, referred as SSM-based joint dictionary learning (SJDL), which can obtain more discriminative features for cross-view action recognition and is demonstrated by experimental results on the IXMAS dataset. Expand
Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition
TLDR
A transfer learning-based framework called transferable dictionary learning and view adaptation (TDVA) model for multi-view human action recognition that progressively bridges the distribution gap among actions from various views by these two phases. Expand
Learning Representations From Skeletal Self-Similarities for Cross-View Action Recognition
TLDR
This paper addresses the problem of large variations of action representations as actions are captured from totally different viewpoints by learning view-invariant representations from skeletal self-similarities of varying scales with a very light multi-stream neural network (MSNN). Expand
Conflux LSTMs Network: A Novel Approach for Multi-View Action Recognition
TLDR
A conflux long short-term memory (LSTMs) network to recognize actions from multi-view cameras using flatten layers followed by SoftMax classifier for action recognition and experimental results over benchmark datasets compared to state-of-the-art report an increase. Expand
Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition
TLDR
This paper proposes a novel method, named Deep Image-to-Video Adaptation and Fusion Networks (DIVAFN), to enhance action recognition in videos by transferring knowledge from images using video keyframes as a bridge and outperforms some state-of-the-art domain adaptation and action recognition methods. Expand
SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
TLDR
This work presents an unsupervised variational approach to the compositional structure of any given scene, and learns to infer two sets of latent representations from RGB video input, which allows the model, SIMONe, to represent object attributes in an allocentric manner which does not depend on viewpoint. Expand
Synthetic Humans for Action Recognition from Unseen Viewpoints
TLDR
The recent advances in monocular 3D human body reconstruction from real action sequences are used to automatically render synthetic training videos for the action labels to improve the performance of human action recognition for viewpoints unseen during training by using synthetic training data. Expand
Temporal Contrastive Graph for Self-supervised Video Representation Learning
TLDR
This work takes a closer look at exploiting the temporal structure of videos and proposes a novel self-supervised method named Temporal Contrastive Graph (TCG), which integrates the prior knowledge about the frame and snippet orders into temporal contrastive graph structures to well preserve the local and global temporal relationships among video frame-sets and snippets. Expand
Unsupervised Domain Adaptation via Importance Sampling
TLDR
This paper presents an importance sampling method for domain adaptation (ISDA), to measure sample contributions according to their “informative” levels, and shows that this method outperforms state-of-the-art methods under both the standard and partial domain adaptation settings. Expand
A Novel Approach for Robust Multi Human Action Recognition and Summarization based on 3D Convolutional Neural Networks.
TLDR
The proposed method provides accurate multi human action recognition that easily used for summarization of any action and its efficiency compared to state-of-the-art methods is demonstrated. Expand
...
1
2
...

References

SHOWING 1-10 OF 57 REFERENCES
Deeply Learned View-Invariant Features for Cross-View Action Recognition
TLDR
A novel sample-affinity matrix is introduced in learning shared features, which accurately balances information transfer within the samples from multiple views and limits the transfer across samples, and which outperform the state-of-the-art approaches. Expand
Cross-View Action Recognition via Transferable Dictionary Learning
TLDR
Two effective approaches to learn dictionaries for robust action recognition across views are presented and it is demonstrated that the proposed approach outperforms recently developed approaches for cross-view action recognition. Expand
Cross-View Action Recognition via a Transferable Dictionary Pair
TLDR
This work presents a method for viewinvariant action recognition based on sparse representations using a transferable dictionary pair, and extends the approach to transferring an action model learned from multiple source views to one target view. Expand
Learning View-Invariant Sparse Representations for Cross-View Action Recognition
TLDR
This approach represents videos in each view using both the corresponding view-specific dictionary and the common dictionary, which has the capability to represent actions from unseen views, and makes the approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Expand
Cross-view action recognition via view knowledge transfer
TLDR
A novel approach to recognizing human actions from different views by view knowledge transfer that can transfer a BoVW action model into a bag-of-bilingual-words (BoBW) model, which is more discriminative in the presence of view changes. Expand
Heterogeneous Discriminant Analysis for Cross-View Action Recognition
TLDR
This work proposes an approach of cross-view action recognition, in which the samples from different views are represented by heterogeneous features with different dimensions, and introduces a discriminative common feature space to bridge the source and target views. Expand
Multitask Linear Discriminant Analysis for View Invariant Action Recognition
TLDR
This work proposes multitask linear discriminant analysis (LDA), a novel multitask learning framework for multiview action recognition that allows for the sharing of discriminative SSM features among different views (i.e., tasks) by choosing an appropriate class indicator matrix. Expand
View-Invariant Action Recognition Using Latent Kernelized Structural SVM
TLDR
A novel learning algorithm is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables and combines the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel. Expand
3D Action Recognition from Novel Viewpoints
  • Hossein Rahmani, A. Mian
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
The proposed human pose representation model is able to generalize to real depth images of unseen poses without the need for re-training or fine-tuning and dramatically outperforms existing state-of-the-art in action recognition. Expand
Cross-View Action Modeling, Learning, and Recognition
TLDR
A novel multiview spatio-temporal and-or graph (MST-AOG) representation for cross-view action recognition, which takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, but the recognition does not need 3D information and is based on 2D video input. Expand
...
1
2
3
4
5
...