A survey of depth and inertial sensor fusion for human action recognition

  title={A survey of depth and inertial sensor fusion for human action recognition},
  author={C. Chen and Roozbeh Jafari and Nasser Kehtarnavaz},
  journal={Multimedia Tools and Applications},
A number of review or survey articles have previously appeared on human action recognition where either vision sensors or inertial sensors are used individually. Considering that each sensor modality has its own limitations, in a number of previously published papers, it has been shown that the fusion of vision and inertial sensor data improves the accuracy of recognition. This survey article provides an overview of the recent investigations where both vision and inertial sensors are used… 

Vision and Inertial Sensing Fusion for Human Action Recognition: A Review

A survey of the papers in which vision and inertial sensing are used simultaneously within a fusion framework in order to perform human action recognition, and challenges as well as possible future directions are stated.

Fusion of depth, skeleton, and inertial data for human action recognition

This paper presents a human action recognition approach by the simultaneous deployment of a second generation Kinect depth sensor and a wearable inertial sensor that indicates recognition improvements when fusing all the datamodalities compared to the situations when data modalities are used individually.

Human Action Recognition Using Fusion of Depth and Inertial Sensors

A human action recognition system that utilizes the fusion of depth and inertial sensor measurements and achieves 95% accuracy in 8-fold cross-validation, which is not only higher than using each sensor separately, but is also better than the best accuracy obtained on the mentioned dataset.

Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition

This paper presents the simultaneous utilization of video images and inertial signals that are captured at the same time via a video camera and a wearable inertial sensor within a fusion framework in

Action Recognition Using Local Visual Descriptors and Inertial Data

This work is using inertial measurement units positioned at left and right hands with first person vision for human action recognition and a novel statistical feature extraction method was proposed based on curvature of the graph of a function and tracking left andright hand positions in space.

Improving human action recognition using decision level fusion of classifiers trained with depth and inertial data

An in depth study of HAR using decision level fusion of classifiers that are trained using RGB-D camera and inertial sensor data using a probabilistic approach in the form of Logarithmic Opinion Pool (LOP).

Simultaneous Utilization of Inertial and Video Sensing for Action Detection and Recognition in Continuous Action Streams

This paper describes the simultaneous utilization of inertial and video sensing for the purpose of achieving human action detection and recognition in continuous action streams and indicates the fusion approach is more effective than when each sensing modality is used individually.

Action Detection and Recognition in Continuous Action Streams by Deep Learning-Based Sensing Fusion

The developed fusion system is examined for two applications: one involving transition movements for home healthcare monitoring and the other involving smart TV hand gestures, which show the effectiveness of the developed fused system in dealing with realistic continuous action streams.

Towards Improved Human Action Recognition Using Convolutional Neural Networks and Multimodal Fusion of Depth and Inertial Sensor Data

Experimental results on UTD-MHAD and Kinect 2D datasets show that proposed method achieves state of the art results when compared to other recently proposed visual-inertial action recognition methods.

Real-Time Continuous Detection and Recognition of Subject-Specific Smart TV Gestures via Fusion of Depth and Inertial Sensing

This paper presents a real-time detection and recognition approach to identify actions of interest involved in the smart TV application from continuous action streams via simultaneous utilization of



Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors

The results indicate that because of the complementary aspect of the data from these sensors, the introduced fusion approaches lead to 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.

A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion

A human action recognition system that runs in real time and simultaneously uses a depth camera and an inertial sensor based on a previously developed sensor fusion method to demonstrate the effectiveness of the system and its real-time throughputs.

UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor

A freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities, which includes RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions is described.

Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition

It is shown that the fusion of data from the vision depth and inertial sensors act in a complementary manner leading to a more robust recognition outcome compared with the situations when each sensor is used individually on its own.

A survey of human motion analysis using depth imagery

Mining actionlet ensemble for action recognition with depth cameras

An actionlet ensemble model is learnt to represent each action and to capture the intra-class variance, and novel features that are suitable for depth data are proposed.

A tutorial on human activity recognition using body-worn inertial sensors

This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition using on-body inertial sensors and describes the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems.

Real-Time Body Tracking with One Depth Camera and Inertial Sensors

A novel sensor fusion approach for real-time full body tracking that succeeds in such difficult situations, and takes inspiration from previous tracking solutions, and combines a generative tracker and a discriminative tracker retrieving closest poses in a database.

A Survey on Human Motion Analysis from Depth Data

An overview of recent approaches that perform human motion analysis which includes depth-based and skeleton-based activity recognition, head pose estimation, facial feature detection, facial performance capture, hand pose estimation and hand gesture recognition is given.

Human Action Recognition With Video Data: Research and Evaluation Challenges

This survey provides an overview of the existing methods based on their ability to handle these challenges as well as how these methods can be generalized and their able to detect abnormal actions.