EventHPE: Event-based 3D Human Pose and Shape Estimation

@article{Zou2021EventHPEE3,
  title={EventHPE: Event-based 3D Human Pose and Shape Estimation},
  author={Shihao Zou and Chuan Guo and Xinxin Zuo and Sen Wang and Pengyu Wang and Xiaoqin Hu and Shoushun Chen and Minglun Gong and Li Cheng},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={10976-10985}
}
  • Shihao ZouChuan Guo Li Cheng
  • Published 15 August 2021
  • Computer Science
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Event camera is an emerging imaging sensor for capturing dynamics of moving objects as events, which motivates our work in estimating 3D human pose and shape from the event signals. Events, on the other hand, have their unique challenges: rather than capturing static body postures, the event signals are best at capturing local motions. This leads us to propose a two-stage deep learning approach, called EventHPE. The first-stage, FlowNet, is trained by unsupervised learning to infer optical flow… 

Figures and Tables from this paper

A Temporal Densely Connected Recurrent Network for Event-based Human Pose Estimation

A novel densely connected recurrent architecture is proposed to address the problem of incomplete information in event cameras, which can explicitly model not only the sequential but also non-sequential geometric consistency across time steps to recover the entire human bodies, achieving a stable and accurate human pose estimation from event data.

Efficient Human Pose Estimation via 3D Event Point Cloud

This work proposes a novel representation of events, the rasterized event point cloud, aggregating events on the same position of a small time slice, which maintains the 3D features from multiple statistical cues and reduces memory consumption and computation complexity.

Bootstrapping Human Optical Flow and Pose

It is shown that, for videos involving humans in scenes, the pose estimation quality of humans can be improved by considering the two tasks at the same time by bootstrapping the optical and human pose estimates.

EventNeRF: Neural Radiance Fields from a Single Colour Event Camera

This paper proposes the first approach for 3D-consistent, dense and photorealistic novel view synthesis using just a single colour event stream as input and presents a neural radiance trained en-tirely in a self-supervised manner from events while pre-serving the original resolution of the colour event channels.

Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos

A spatio-temporal tendency reasoning network for recovering human body pose and shape from videos that is competitive with the state-of-the-art on three datasets and introduces integration strategies to integrate and refine the spatio/temporal feature representations.

Event-driven Video Deblurring via Spatio-Temporal Relation-Aware Network

A new Spatio-Temporal Relation-Attention network (STRA), which model the brightness changes as an extra prior to aware blurring contexts in each frame to recover spatial texture from events constantly and develops a temporal memory block to restore long-range dependencies of event sequences continuously.

S2N: Suppression-Strengthen Network for Event-Based Recognition Under Variant Illuminations

A novel suppression-strengthen network (S2N) is presented to augment the event feature representation after suppressing the influence of degradation to generate robust event representation by adaptively perceiving the local variations between the center and surrounding regions.

State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.

References

SHOWING 1-10 OF 36 REFERENCES

EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera

This paper proposes EventCap — the first approach for 3D capturing of high-speed human motions using a single event camera, which combines model-based optimization and CNN-based human pose detection to capture high frequency motion details and to reduce the drifting in the tracking.

VIBE: Video Inference for Human Body Pose and Shape Estimation

This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.

DHP19: Dynamic Vision Sensor 3D Human Pose Dataset

A novel benchmark dataset of human body movements, the Dynamic Vision Sensor Human Pose dataset (DHP19), consisting of recordings from 4 synchronized 346x260 pixel DVS cameras, for a set of 33 movements with 17 subjects, which achieves an average 3D pose estimation error of about 8 cm.

Learning Event-Based Motion Deblurring

This paper starts from a sequential formulation of event-based motion deblurring, then shows how its optimization can be unfolded with a novel end-toend deep architecture, and proposes a differentiable directional event filtering module to effectively extract rich boundary prior from the evolution of events.

EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras

Event-based cameras have shown great promise in a variety of situations where frame based cameras suffer, such as high speed motions and high dynamic range scenes. However, developing algorithms for

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.

Learning to Estimate 3D Human Pose and Shape from a Single Color Image

This work addresses the problem of estimating the full body 3D human pose and shape from a single color image and proposes an efficient and effective direct prediction method based on ConvNets, incorporating a parametric statistical body shape model (SMPL) within an end-to-end framework.

Learning 3D Human Dynamics From Video

The approach is designed so it can learn from videos with 2D pose annotations in a semi-supervised manner and obtain state-of-the-art performance on the 3D prediction task without any fine-tuning.

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

A novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image and a large-scale synthetic dataset utilizing web-crawled Mocap sequences, 3D scans and animations is constructed.

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.