• Corpus ID: 208175419

Action Recognition Using Volumetric Motion Representations

  title={Action Recognition Using Volumetric Motion Representations},
  author={Michael Peven and Gregory Hager and Austin Reiter},
Traditional action recognition models are constructed around the paradigm of 2D perspective imagery. Though sophisticated time-series models have pushed the field forward, much of the information is still not exploited by confining the domain to 2D. In this work, we introduce a novel representation of motion as a voxelized 3D vector field and demonstrate how it can be used to improve performance of action recognition networks. This volumetric representation is a natural fit for 3D CNNs, and… 

Figures and Tables from this paper



3D Convolutional Neural Networks for Human Action Recognition

A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.

Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks

A new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition from RGB-D data and takes better advantage of the trained ConvNets models over ImageNet.

Two-Stream Convolutional Networks for Action Recognition in Videos

This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

Interpretable 3D Human Action Analysis with Temporal Convolutional Networks

  • Tae Soo KimA. Reiter
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2017
This work proposes to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition, and aims to take a step towards a spatio-temporal model that is easier to understand, explain and interpret.

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

This paper proposes a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images and introduces a Markov chain model which adds cues successively.

Learning Action Recognition Model from Depth and Skeleton Videos

  • H. RahmaniBennamoun
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
A deep model which efficiently models human-object interactions and intra-class variations under viewpoint changes and an end-to-end learning framework which is able to effectively combine the view-invariant body-part representation from skeletal and depth images, and learn the relations between the human body-parts and the environmental objects.

Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks

Factorized spatio-temporal convolutional networks (FstCN) are proposed that factorize the original 3D convolution kernel learning as a sequential process of learning 2D spatial kernels in the lower layers, followed by learning 1D temporal kernel in the upper layers.

Recurrent Attention Models for Depth-Based Person Identification

An attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark is presented, which produces state-of-the-art results on several published datasets given only depth images.

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

A large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects is introduced and a new recurrent neural network structure is proposed to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification.