NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Abstract

Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+Dbased action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art handcrafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

DOI: 10.1109/CVPR.2016.115

Extracted Key Phrases

6 Figures and Tables

Showing 1-10 of 49 references
Showing 1-10 of 43 extracted citations
05010020162017
Citations per Year

59 Citations

Semantic Scholar estimates that this publication has received between 43 and 92 citations based on the available data.

See our FAQ for additional information.