Learn More
Reconstructing an arbitrary configuration of 3D points from their projection in an image is an ill-posed problem. When the points hold semantic meaning, such as anatomical landmarks on a body, human observers can often infer a plausible 3D configuration, drawing on extensive visual memory. We present an activity-independent method to recover the 3D(More)
The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, af-fordance(More)
State-of-the-art approaches for articulated human pose estimation are rooted in parts-based graphical models. These models are often restricted to tree-structured representations and simple parametric potentials in order to enable tractable inference. However, these simple dependencies fail to capture all the interactions between body parts. While models(More)
Pose Machines provide a powerful modular framework for articulated pose estimation. The sequential prediction framework allows for the learning of rich implicit spatial models, but currently relies on manually designed features for representing image and spatial context. In this work, we incorporate a convolutional network architecture into the pose machine(More)
This paper presents a method for acquiring dense non-rigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking , subdivision surface fitting, and mesh deformation to acquire detailed hand models(More)
We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation. Current state-of-the-art approaches learn a single model and modify inference procedures to produce a small number of diverse predictions. We take the(More)
We evaluate the performance of a widely used tracking-by-detection and data association multi-target tracking pipeline applied to an activity-rich video dataset. In contrast to traditional work on multi-target pedestrian tracking where people are largely assumed to be upright, we use an activity-rich dataset that includes a wide range of body poses derived(More)
  • 1