Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

@article{Niebles2007UnsupervisedLO,
  title={Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words},
  author={Juan Carlos Niebles and Hongcheng Wang and Li Fei-Fei},
  journal={International Journal of Computer Vision},
  year={2007},
  volume={79},
  pages={299-318}
}
We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation… Expand

Figures and Tables from this paper

Hybrid generative-discriminative human action recognition by combining spatiotemporal words with supervised topic models
TLDR
A hybrid generative-discriminative learning method for human action recognition from video sequences that combines a bag-of-words component with supervised latent topic models and is extended to exploit both labeled data and unlabeled data to learn human actions under a unified framework. Expand
Unsupervised Human Action Categorization Using Latent Dirichlet Markov Clustering
  • Xudong Zhu, H. Li
  • Computer Science
  • 2012 Fourth International Conference on Intelligent Networking and Collaborative Systems
  • 2012
TLDR
A novel unsupervised learning method for human action categories from video sequences using Latent Dirichlet Markov Clustering (LDMC), and a new approximation to online Bayesian inference is formulated to enable human action classification in new video data online in real-time. Expand
Human action recognition using labeled Latent Dirichlet Allocation model
TLDR
A new action recognition method which represents human actions as a bag of spatio-temporal words extracted from input video sequences and uses L-LDA (labeled Latent Dirichlet Allocation) model as a classifier, which is better than its unsupervised counterpart LDA as well as SVMs (support vector machines). Expand
Projection transform on spatio-temporal context for action recognition
  • Wanru Xu, Z. Miao, Qiang Zhang
  • Computer Science
  • Multimedia Tools and Applications
  • 2014
TLDR
Through the analysis of feature distribution and their interactions over spatio-temporal domain, a novel projection transform is proposed to take the two factors into account and outperforms other previous published results on the Weizmann and KTH datasets. Expand
Spatial-Temporal Context for Action Recognition Combined with Confidence and Contribution Weight
  • Wanru Xu, Z. Miao, Jian Zhang, Qiang Zhang, Haohao Wu
  • Computer Science
  • 2013 2nd IAPR Asian Conference on Pattern Recognition
  • 2013
TLDR
A new method is proposed for human action analysis in videos that outperforms other previous published results on the Weizmann and KTH datasets through the analysis of feature distribution and their interactions over spatial-temporal domain. Expand
Structured Time Series Analysis for Human Action Segmentation and Recognition
TLDR
By combining the temporal segmentation algorithm and the alignment algorithm, online human action recognition can be performed by associating a few labeled examples from motion capture data and its ability to handle noisy and partially occluded data, in the transfer learning module is demonstrated. Expand
Human Abnormal Action Identification Method in Different Scenarios
TLDR
Through a method of double-layer Bag-of-Words model, the human action normal or abnormal in a special occasion to a new video can be categorized and the corresponding security measures can be advanced. Expand
Two-stream spatiotemporal feature fusion for human action recognition
TLDR
This paper proposes a novel human action recognition method by fusing spatial and temporal features learned from a simple unsupervised convolutional neural network called principal component analysis network (PCANet) in combination with bag-of-features (BoF) and vector of locally aggregated descriptors (VLAD) encoding schemes. Expand
Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features
TLDR
A unified framework for human action classification and localization in video using structuredlearning of local space-time features using Dynamic Conditional RandomFields developed to incorporate the spatial and temporal structure constraints of superpixels extracted around those features. Expand
Recognizing human action and identity based on affine-SIFT
  • Zhuo Zhang, Jia Liu
  • Mathematics
  • 2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM)
  • 2012
This paper presents a novel method based on Affine-SIFT detector to capture motion for human action recognition. More specifically, we propose a new action representation based on computing a richExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
A Hierarchical Model of Shape and Appearance for Human Action Classification
TLDR
A hierarchical model that can be characterized as a constellation of bags-of-features and that is able to combine both spatial and spatial-temporal features is proposed and shown to improve the classification performance over bag of feature models. Expand
Unsupervised Learning of Human Action Categories
TLDR
The approach is not only able to classify different actions, but also to localize different actions simultaneously in a novel and complex video sequence. Expand
Kernel-based Recognition of Human Actions Using Spatiotemporal Salient Points
This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that areExpand
Spatiotemporal salient points for visual recognition of human actions
This paper addresses the problem of human-action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that areExpand
Efficient visual event detection using volumetric features
TLDR
This paper constructs a realtime event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time and confirms that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities. Expand
Actions as Space-Time Shapes
TLDR
The method is fast, does not require video alignment, and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, nonrigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action, and low-quality video. Expand
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
TLDR
A novel unsupervised learning method for human action categories is presented that represents a video sequence as a collection of spatial-temporal words by extracting space-time interest points. Expand
Unsupervised Learning of Human Motion
TLDR
An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts automatically from unlabeled training data is presented. Expand
Recognizing action at a distance
TLDR
A novel motion descriptor based on optical flow measurements in a spatiotemporal volume for each stabilized human figure is introduced, and an associated similarity measure to be used in a nearest-neighbor framework is introduced. Expand
Hybrid models for human motion recognition
TLDR
This paper focuses on methods which represent the human motion model as a triangulated graph and introduces global variables in the model, which can represent global properties such as translation, scale or view-point, and shows that the suggested hybrid probabilistic model leads to faster convergence of learning phase and higher recognition rate. Expand
...
1
2
3
4
5
...