Unsupervised Learning Spatio-temporal Features for Human Activity Recognition from RGB-D Video Data