Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey
@inproceedings{AsadiAghbolaghi2017DeepLF, title={Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey}, author={Maryam Asadi-Aghbolaghi and Albert Clap{\'e}s and Marco Bellantonio and Hugo Jair Escalante and V{\'i}ctor Ponce-L{\'o}pez and Xavier Bar{\'o} and Isabelle Guyon and Shohreh Kasaei and Sergio Escalera}, booktitle={Gesture Recognition}, year={2017} }
Interest in automatic action and gesture recognition has grown considerably in the last few years. [] Key Method We introduce a taxonomy that summarizes important aspects of deep learning for approaching both tasks. Details of the proposed architectures, fusion strategies, main datasets, and competitions are reviewed. Also, we summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, their highlighting features, and opportunities and…
43 Citations
Multimodal 2DCNN action recognition from RGB-D data with video summarization
- Computer Science
- 2017
This work extends 2DCNN is extended to multimodal (MM2DCNN) by introducing scene flow fields as the new input for an additional stream and integrates them with a late fusion for every summarization sequence modality along with uniform random selection.
MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences
- Computer ScienceExpert Syst. Appl.
- 2020
Small Deep Learning Models for Hand Gesture Recognition
- Computer Science2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)
- 2019
An approach is proposed that leverages the recent progress on Convolutional Neural Networks (CNNs) and uses deep-learning-based strategy to recognize 24 hand gestures from the American Sign Language (ASL) and has great feasibility to apply the models on resource-constrained devices and embedded visual applications.
Similar Finger Gesture Recognition using Triplet-loss Networks
- Computer Science2019 16th International Conference on Machine Vision Applications (MVA)
- 2019
A framework based on a triplet-loss network which learns to decrease the distance of true positive boundaries while increasing that of false positive ones is proposed, and a temporal representation of the segmented gesture is adopted using a stack of feature maps for gesture classification.
Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies
- Computer Science2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
- 2017
Multiodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion, achieving state of the art results.
Dynamic hand gesture recognition based on short-term sampling neural networks
- Computer ScienceIEEE/CAA Journal of Automatica Sinica
- 2021
A novel deep learning network for hand gesture recognition that integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.
An Incremental Learning Framework for Skeletal-based Hand Gesture Recognition with Leap Motion
- Computer Science2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER)
- 2019
A novel framework which consists of an incremental learning (IL) algorithm without deep structure is proposed and applied to hand gestures classification that explicitly aimed to the LM data and the recognition performance is improved distinctly in robustness and training time than the LSTM network.
Deep signature-based isolated and large scale continuous gesture recognition approach
- Computer ScienceJ. King Saud Univ. Comput. Inf. Sci.
- 2022
Learning dictionaries of kinematic primitives for action classification
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
The method is proved to be tolerant to view point changes, and can thus support cross-view action recognition, and may be seen as a backbone of a general approach to action understanding, with potential applications in robotics.
Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection
- Computer ScienceIEEE Transactions on Image Processing
- 2018
This work aims to leverage the geometric relations among joints for action recognition by introducing three primitive geometries: joints, edges, and surfaces and dramatically outperforms the existing state-of-the-art methods for both tasks of action recognition and action detection.
References
SHOWING 1-10 OF 195 REFERENCES
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
- Computer ScienceECCV
- 2016
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.…
Deep learning based super-resolution for improved action recognition
- Computer Science2015 International Conference on Image Processing Theory, Tools and Applications (IPTA)
- 2015
The experimental results obtained on down-sampled version of a large subset of Hoolywood2 benchmark database show the importance of the proposed system in increasing the recognition rate of a state-of-the-art action recognition system for handling low-resolution videos.
Two-Stream Convolutional Networks for Action Recognition in Videos
- Computer ScienceNIPS
- 2014
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
A Key Volume Mining Deep Framework for Action Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A key volume mining deep framework to identify key volumes and conduct classification simultaneously and an effective yet simple "unsupervised key volume proposal" method for high quality volume sampling are proposed.
Two-Stream SR-CNNs for Action Recognition in Videos
- Computer ScienceBMVC
- 2016
This paper proposes a new deep architecture by incorporating human/object detection results into the framework, called two-stream semantic region based CNNs (SR-CNNs), which not only shares great modeling capacity with the original two- stream CNNs, but also exhibits the flexibility of leveraging semantic cues for action understanding.
First Person Action Recognition Using Deep Learned Descriptors
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work proposes convolutional neural networks (CNNs) for end to end learning and classification of wearer's actions and shows that the proposed network can generalize and give state of the art performance on various disparate egocentric action datasets.
Deep Learning-Based Fast Hand Gesture Recognition Using Representative Frames
- Computer Science2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
- 2016
A vision-based hand gesture recognition system for intelligent vehicles by using novel tiled image patterns and tiled binary pattern within a semantic segmentation- based deep learning framework, the deconvolutional neural network and an improved classification accuracy is observed.
3D-based Deep Convolutional Neural Network for action recognition with depth sequences
- Computer ScienceImage Vis. Comput.
- 2016
Towards Good Practices for Very Deep Two-Stream ConvNets
- Computer ScienceArXiv
- 2015
This report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain, and extends the Caffe toolbox into Multi-GPU implementation with high computational efficiency and low memory consumption.
Learning Deep Features for Scene Recognition using Places Database
- Computer ScienceNIPS
- 2014
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.