A Hierarchical Model of Shape and Appearance for Human Action Classification

@article{Niebles2007AHM,
  title={A Hierarchical Model of Shape and Appearance for Human Action Classification},
  author={Juan Carlos Niebles and Li Fei-Fei},
  journal={2007 IEEE Conference on Computer Vision and Pattern Recognition},
  year={2007},
  pages={1-8}
}
We present a novel model for human action categorization. A video sequence is represented as a collection of spatial and spatial-temporal features by extracting static and dynamic interest points. We propose a hierarchical model that can be characterized as a constellation of bags-of-features and that is able to combine both spatial and spatial-temporal features. Given a novel video sequence, the model is able to categorize human actions in a frame-by-frame basis. We test the model on a… 
Learning human actions in video
TLDR
This dissertation develops state-of-the-art feature extraction algorithms that robustly encode video information for both, action classification and action localization on realistic video data and proposes two new approaches to describe local features in videos.
Combining Models of Pose and Dynamics for Human Motion Recognition
TLDR
A novel method for human motion recognition that encodes the spatial-temporal relationships between the dynamics of the motion and the appearance of individual poses and a higher level model that can be described as "constellation of constellation models".
Learning a hierarchy of discriminative space-time neighborhood features for human action recognition
TLDR
This work proposes to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category by extracting local motion and appearance features, quantizing them to a visual vocabulary, and forming candidate neighborhoods that form the most informative configurations.
Action Recognition by Multiple Features and Hyper-Sphere Multi-class SVM
TLDR
A novel framework for action recognition based on multiple features for improve action recognition in videos using two kinds of features: a quantized vocabulary of local spatio-temporal volumes and higher-order statistical models of interest points, which aims to capture the global information of the actor.
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
TLDR
A novel unsupervised learning method for human action categories that can recognize and localize multiple actions in long and complex video sequences containing multiple motions.
Action recognition via multi-feature fusion and Gaussian process classification
TLDR
The experimental results have shown that the fusion of multiple features improved the recognition accuracy compared with the use of any single feature type, and the redundancy of fused features can be reduced by spectral feature analysis.
Region-based Mixture Models for human action recognition in low-resolution videos
TLDR
The Layered Elastic Motion Tracking (LEMT) method is adopted, a hybrid feature representation is presented to integrate both of the shape and motion features, and a Region-based Mixture Model (RMM) is proposed to be utilized for action classification.
Retrieving Human Actions Using Spatio-Temporal Features and Relevance Feedback
In this paper, we extend the idea of 2D objects retrieval to 3D human action retrieval and present the solution of action retrieval with spatio-temporal features. The framework of this action
Learning semantic features for action recognition via diffusion maps
TLDR
This paper presents a principled approach to learning a semantic vocabulary from a large amount of video words using Diffusion Maps embedding, and conjecture that the mid-level features produced by similar video sources must lie on a certain manifold.
Spatio-Temporal Frames in a Bag-of-Visual-Features Approach for Human Actions Recognition
TLDR
This work proposes to build a BoVF representation for videos by collecting 2D interest points directly from the traditional frames, and assumes that such features are able to capture dynamic information from the videos, and are well-suited to recognize human actions from them, without the need of 3D extensions for the descriptors.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Actions as Space-Time Shapes
TLDR
The method is fast, does not require video alignment, and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, nonrigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action, and low-quality video.
Hybrid models for human motion recognition
TLDR
This paper focuses on methods which represent the human motion model as a triangulated graph and introduces global variables in the model, which can represent global properties such as translation, scale or view-point, and shows that the suggested hybrid probabilistic model leads to faster convergence of learning phase and higher recognition rate.
Hierarchical part-based visual object categorization
  • Guillaume Bouchard, B. Triggs
  • Mathematics, Computer Science
    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)
  • 2005
We propose a generative model that codes the geometry and appearance of generic visual object categories as a loose hierarchy of parts, with probabilistic spatial relations linking parts to subparts,
Unsupervised Learning of Human Action Categories
TLDR
The approach is not only able to classify different actions, but also to localize different actions simultaneously in a novel and complex video sequence.
Recognizing action at a distance
TLDR
A novel motion descriptor based on optical flow measurements in a spatiotemporal volume for each stabilized human figure is introduced, and an associated similarity measure to be used in a nearest-neighbor framework is introduced.
Learning the Statistics of People in Images and Video
TLDR
The paper provides a detailed analysis of the statistics of how people appear in scenes and provides a connection between work on natural image statistics and the Bayesian tracking of people.
Discovering objects and their location in images
TLDR
This work treats object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics, and develops a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA).
Unsupervised Learning of Human Motion
TLDR
An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts automatically from unlabeled training data is presented.
Automatic Annotation of Everyday Movements
TLDR
A system that can annotate a video sequence with a description of the appearance of each actor; when the actor is in view; and a representation of the actor's activity while in view is described.
Behavior recognition via sparse spatio-temporal features
TLDR
It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
...
1
2
...