Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving

  title={Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving},
  author={Jingxiao Zheng and Xin Yu Shi and Alexander N. Gorban and Junhua Mao and Yang Song and C. Qi and Ting Liu and Visesh Chari and Andre Cornman and Yin Zhou and Congcong Li and Drago Anguelov},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  • Jingxiao ZhengX. Shi Drago Anguelov
  • Published 22 December 2021
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy. Data collected for other use cases (such as virtual reality, gaming, and animation) may therefore not be usable for AV applications. This necessitates the collection and annotation of a large amount of 3D… 

HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

HUMan 3DIL (HU man 3D from Images and LiDAR), which makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin on the task of 3D pose estimation.

Weakly Supervised 3D Multi-person Pose Estimation for Large-scale Scenes based on Monocular Camera and Single LiDAR

This work proposes a monocular camera and single LiDAR-based method for 3D multi-person pose estimation in large-scale scenes, which is easy to de-ploy and insensitive to light, and exploits the inherent geometry constraints of point cloud for self-supervision and utilizes 2D keypoints on images for weak supervision.

E3Pose: Energy-Efficient Edge-assisted Multi-camera System for Multi-human 3D Pose Estimation

E 3 Pose incorporates an attention-based LSTM to predict the occlusion information of each camera view and guide camera selection before cameras are selected to process the images of a scene, and runs a camera selection algorithm based on the Lyapunov optimization framework to make long-term adaptive selection decisions.



Learning Monocular 3D Human Pose Estimation from Multi-view Images

This paper trains the system to predict the same pose in all views, and proposes a method to estimate camera pose jointly with human pose, which lets us utilize multiview footage where calibration is difficult, e.g., for pan-tilt or moving handheld cameras.

3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

A spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement and the strengths of the network's individual submodules are shown.

Rgb-D Fusion For Point-Cloud-Based 3d Human Pose Estimation

  • Jia YingXu Zhao
  • Computer Science
    2021 IEEE International Conference on Image Processing (ICIP)
  • 2021
A 2D pose estimator is adopted to extract color features from the RGB image to fully exploit geometric information and a 3D learning module to extract point-wise features is designed to take advantage of local information.

Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry

EpipolarPose is presented, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics, and a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth.

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.

Exploiting Temporal Context for 3D Human Pose Estimation in the Wild

A bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos and shows that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

AdaFuse is an adaptive multiview fusion method, which can enhance the features in occluded views by leveraging those in visible views by exploring the sparsity of the heatmap representation.

Exploiting Temporal Information for 3D Human Pose Estimation

A sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training is designed, which helps the network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails.

PoseNet3D: Unsupervised 3D Human Shape and Pose Estimation

A novel neural network framework, PoseNet3D, that takes 2D joints as input and outputs 3D skeletons and SMPL body model parameters is proposed, demonstrating that the approach reduces the 3D joint prediction error by 18\% compared to previous unsupervised methods.