Multiview Cauchy Estimator Feature Embedding for Depth and Inertial Sensor-Based Human Action Recognition

  title={Multiview Cauchy Estimator Feature Embedding for Depth and Inertial Sensor-Based Human Action Recognition},
  author={Yanan Guo and Dapeng Tao and Weifeng Liu and Jun Cheng},
  journal={IEEE Transactions on Systems, Man, and Cybernetics: Systems},
  • Yanan GuoDapeng Tao Jun Cheng
  • Published 7 August 2016
  • Computer Science
  • IEEE Transactions on Systems, Man, and Cybernetics: Systems
The ever-growing popularity of Kinect and inertial sensors has prompted intensive research efforts on human action recognition. [] Key Method By minimizing empirical risk, MCEFE integrates the encoded complementary information in multiple views to find the unified data representation and the projection matrices. To enhance robustness to outliers, the Cauchy estimator is imposed on the reconstruction error.

Figures and Tables from this paper

Using a Multilearner to Fuse Multimodal Features for Human Action Recognition

A human action recognition method based onRGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features and achieves ideal recognition results on the public G3D and CAD60 datasets.

Real-time action recognition by feature-level fusion of depth and inertial sensor

A novel approach for human action recognition, which is based on feature-level fusion of depth and inertial sensor, which has a low computational complexity and can be employed in real-time systems.

Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints

This work focused skeleton based human activity recognition and proposed motion trajectory computation scheme using Fourier temporal features from the interpolation of skeleton joints of human body by considering human motion as trajectory ofkeleton joints.

Effective human action recognition by combining manifold regularization and pairwise constraints

A novel local structure preserving approach by considering both manifold regularization and pairwise constraints is introduced and can better preserve the local geometry of data distribution and achieve the effective recognition of human action recognition.

Semi-supervised Hessian Eigenmap for Human Action Recognition

The experimental results demonstrate that the proposed semi-supervised Hessian Eigenmap algorithm outperforms the representative semi- supervised Laplacian Eigermap algorithm.

Vision and Inertial Sensing Fusion for Human Action Recognition: A Review

A survey of the papers in which vision and inertial sensing are used simultaneously within a fusion framework in order to perform human action recognition, and challenges as well as possible future directions are stated.

Local Structure Preserving Using Manifold Regularization and Pairwise Constraints for Action Recognition

A local structure preserving method that effectively integrates manifold regularization and pairwise constraints is proposed that outperforms the baseline algorithms and a new graph Laplacian is constructed by combining the traditional LaPLacian and Pairwise constraints.

Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors.

The proposed frameworks are evaluated on three publicly available multimodal HAR datasets, namely, UTD Multimodal Human Action Dataset (MHAD), Berkeley MHAD, and UTD-MHAD Kinect V2 to show the supremacy of the proposed fusion frameworks over existing methods.



Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors

The results indicate that because of the complementary aspect of the data from these sensors, the introduced fusion approaches lead to 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.

UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor

A freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities, which includes RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions is described.

Berkeley MHAD: A comprehensive Multimodal Human Action Database

A controlled multimodal dataset consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones, provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains.

Mining discriminative states of hands and objects to recognize egocentric actions with a wearable RGBD camera

  • Shaohua WanJ. Aggarwal
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2015
This work presents a novel model that automatically mines discriminative states for recognizing egocentric actions and proposes a novel kernel function and a Multiple Kernel Learning based framework to learn adaptive weights for different states.

Characterizing Humans on Riemannian Manifolds

This paper shows how to extend to the multiclassification case, presenting a novel descriptor, named weighted array of covariances, especially suited for dealing with tiny image representations, and adopts the Campbell-Baker-Hausdorff expansion as a means to approximate on the tangent space the genuine distances on the manifold in a very efficient way.

Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera

  • Lu XiaJ. Aggarwal
  • Computer Science
    2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
A filtering method to extract STIPs from depth videos (called DSTIP) that effectively suppress the noisy measurements is presented and a novel depth cuboid similarity feature (DCSF) is built to describe the local 3D depth cuboids around the DSTips with an adaptable supporting size.

Bayesian Co-Boosting for Multi-modal Gesture Recognition

This paper proposes a novel Bayesian Co-Boosting framework for multi-modal gesture recognition, Inspired by boosting learning and co-training method, which combines multiple collaboratively trained weak classifiers to construct the final strong classifier for the recognition task.

Spatio-temporal cuboid pyramid for action recognition using depth motion sequences

An effective method to recognize human actions from sequences of depth maps, which are captured by a consume depth sensor, is presented and a spatio-temporal cuboid pyramid (STCP) to subdivide the DMS volumes into a set of spatial cuboids on scaled temporal levels is proposed.

Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home

This work presents a novel depth video-based translation and scaling invariant human activity recognition (HAR) system utilizing R transformation of depth silhouettes, and demonstrates that the proposed method is robust, reliable, and efficient in recognizing the daily human activities.

Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition

It is shown that the fusion of data from the vision depth and inertial sensors act in a complementary manner leading to a more robust recognition outcome compared with the situations when each sensor is used individually on its own.