• Corpus ID: 240354595

On-device Real-time Hand Gesture Recognition

@article{Sung2021OndeviceRH,
  title={On-device Real-time Hand Gesture Recognition},
  author={George Sung and Kanstantsin Sokal and Esha Uboweja and Valentin Bazarevsky and Jonathan Baccash and Eduard Gabriel Bazavan and Chuo-Ling Chang and Matthias Grundmann},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.00038}
}
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands [14, 2] as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We create two different gesture classifiers, one based on heuristics and the other using neural… 

Figures from this paper

BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation
TLDR
The main contributions include i) a novel method for 3D ground truth data acquisition, ii) updated 3D body tracking with additional hand landmarks and iii) full body pose estimation from a monocular image.
Internet of things system for ultraviolet index monitoring in the community of Chirinche Bajo
The impact of the ultraviolet radiation index is becoming more intense and dangerous for the health of the epidermis and eyesight of people, especially for farmers in the Chirinche Bajo (Ecuador)

References

SHOWING 1-10 OF 14 REFERENCES
Skeleton-Based Dynamic Hand Gesture Recognition
TLDR
The geometric shape of the hand is exploited to extract an effective descriptor from hand skeleton connected joints returned by the Intel RealSense depth camera to achieve the classification by a linear SVM classifier.
Hand Gesture Recognition Based on Computer Vision: A Review of Techniques
TLDR
A review of the literature on hand gesture techniques and introduces their merits and limitations under different circumstances, and tabulates the performance of these methods, focusing on computer vision techniques that deal with the similarity and difference points.
MediaPipe Hands: On-device Real-time Hand Tracking
TLDR
A real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications through MediaPipe, a framework for building cross-platform ML solutions.
Deep Learning for Hand Gesture Recognition on Skeletal Data
TLDR
A new Convolutional Neural Network (CNN) where sequences of hand-skeletal joints' positions are processed by parallel convolutions is proposed where this model achieves a state-of-the-art performance on a challenging dataset.
Vision based hand gesture recognition for human computer interaction: a survey
TLDR
An analysis of comparative surveys done in the field of gesture based HCI and an analysis of existing literature related to gesture recognition systems for human computer interaction by categorizing it under different key parameters are provided.
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows
TLDR
This paper shows that the proposed methods outperform the state of the art, supporting the practical construction of an accurate family of models based on large-scale training with diverse and incompletely labeled image and video data.
GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models
TLDR
A statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework, that supports facial expression analysis, as well as body shape and pose estimation.
MediaPipe: A Framework for Building Perception Pipelines
TLDR
This work shows that these features enable a developer to focus on the algorithm or model development and use MediaPipe as an environment for iteratively improving their application with results reproducible across different devices and platforms.
...
...