Recognizing and tracking clasping and occluded hands

  title={Recognizing and tracking clasping and occluded hands},
  author={John R. Zhang and John R. Kender},
  journal={2013 IEEE International Conference on Image Processing},
  • John R. ZhangJ. Kender
  • Published 1 September 2013
  • Computer Science
  • 2013 IEEE International Conference on Image Processing
We present a purely algorithmic method for distinguishing when two hands are visually merged together and tracking their positions by propagating tracking information from anchor frames in single-camera video without depth information. We demonstrate and evaluate on a manually labeled dataset selected primarily for clasped hands with 698 images of a single speaker with 1301 annotated left and right hands. Toward the goal of recognizing clasping hands, our method performs better than baseline on… 

Figures and Tables from this paper

Long Term Hand Tracking with Proposal Selection

  • Qingshuang ChenF. Zhu
  • Computer Science
    2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
  • 2018
This paper uses color and motion features for hand tracker, and obtain hand detection proposals using Faster Region-based Convolutional Neural Network (RCNN) and Stacked-hour-glasses network for human pose estimation to provide possible hand regions based on the spatial relationship between wrist and other joints.

Correlating Speaker Gestures in Political Debates with Audience Engagement Measured via EEG

The proposed gesture attributes, derived from speakers' tracked hand motions to automatically quantify these gestures from video are proposed and a correlation between gesture attributes and an objective method of measuring audience engagement: electroencephalography in the domain of political debates is demonstrated.

Volcanic Unrest and Pre-eruptive Processes: A Hazard and Risk Perspective

Volcanic unrest is complex and capable of producing multiple hazards that can be triggered by a number of different subsurface processes. Scientific interpretations of unrest data aim to better



Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts

The goal of this work is to detect hand and arm positions over continuous sign language video sequences of more than one hour in length and it is shown that the method is able to identify the true arm and hand locations.

Hand detection using multiple proposals

The hand detector exceeds the state of the art on two public datasets, including the PASCAL VOC 2010 human layout challenge and is introduced with a fully annotated hand dataset for training and testing.

Learning sign language by watching TV (using weakly aligned subtitles)

This work proposes a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation, and properly models the case of hands touching and shows that by optimizing a scoring function based on multiple instance learning, it is able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision.

Robust Real-Time Face Detection

A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.

The Pascal Visual Object Classes (VOC) Challenge

The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

Minimal Training, Large Lexicon, Unconstrained Sign Language Recognition

A flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign is presented.

Two-Frame Motion Estimation Based on Polynomial Expansion

A method to estimate displacement fields from the polynomial expansion coefficients is derived and after a series of refinements leads to a robust algorithm that shows good results on the Yosemite sequence.

Automatic Feature Construction and a Simple Rule Induction Algorithm for Skin Detection

A new constructive induction algorithm that creates adequate attributes for skin detection by using a new restricted covering algorithm, called RCA, as its learning component, which produces a single rule with competitive results when compared against C4.5.

SUN database: Large-scale scene recognition from abbey to zoo

This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance.