Lifting Monocular Events to 3D Human Poses

  title={Lifting Monocular Events to 3D Human Poses},
  author={Gianluca Scarpellini and Pietro Morerio and Alessio Del Bue},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
This paper presents a novel 3D human pose estimation approach using a single stream of asynchronous events as input. Most of the state-of-the-art approaches solve this task with RGB cameras, however struggling when subjects are moving fast. On the other hand, event-based 3D pose estimation benefits from the advantages of event-cameras, especially their efficiency and robustness to appearance changes. Yet, finding human poses in asynchronous events is in general more challenging than standard… Expand

Figures and Tables from this paper


EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera
This paper proposes EventCap — the first approach for 3D capturing of high-speed human motions using a single event camera, which combines model-based optimization and CNN-based human pose detection to capture high frequency motion details and to reduce the drifting in the tracking. Expand
DHP19: Dynamic Vision Sensor 3D Human Pose Dataset
A novel benchmark dataset of human body movements, the Dynamic Vision Sensor Human Pose dataset (DHP19), consisting of recordings from 4 synchronized 346x260 pixel DVS cameras, for a set of 33 movements with 17 subjects, which achieves an average 3D pose estimation error of about 8 cm. Expand
Taskonomy: Disentangling Task Transfer Learning
This work proposes a fully computational approach for modeling the structure of space of visual tasks via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space and provides a computational taxonomic map for task transfer learning. Expand
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras. Expand
CHAPTER 2 – 3D Transformations
High Speed and High Dynamic Range Video with an Event Camera
This work proposes a novel recurrent network to reconstruct videos from a stream of events, and trains it on a large amount of simulated event data, and shows that off-the-shelf computer vision algorithms can be applied to the reconstructions and that this strategy consistently outperforms algorithms that were specifically designed for event data. Expand
3D Human Pose Estimation With 2D Marginal Heatmaps
Improvements to 3D coordinate prediction are proposed which avoid the aforementioned undesirable traits by predicting 2D marginal heatmaps under an augmented soft-argmax scheme and the resulting model, MargiPose, produces visually coherent heatmaps whilst maintaining differentiability. Expand
2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning
It is shown that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results, and that optimization from end-to-end leads to significantly higher accuracy than separated learning. Expand
Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose
This paper proposes a fine discretization of the 3D space around the subject and trains a ConvNet to predict per voxel likelihoods for each joint, which creates a natural representation for 3D pose and greatly improves performance over the direct regression of joint coordinates. Expand
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand