AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

  title={AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing},
  author={Jiaxi Jiang and Paul Streli and Huajian Qiu and Andreas Rene Fender and Larissa Laich and Patrick Snape and Christian Holz},
  booktitle={European Conference on Computer Vision},
. Today’s Mixed Reality head-mounted displays track the user’s head pose in world space as well as the user’s hands for interaction in both Augmented Reality and Virtual Reality scenarios. While this is ad-equate to support user input, it unfortunately limits users’ virtual representations to just their upper bodies. Current systems thus resort to floating avatars, whose limitation is particularly evident in collaborative settings. To estimate full-body poses from the sparse input sources… 

Figures and Tables from this paper

Ego-Body Pose Estimation via Ego-Head Pose Estimation

A new method, Ego-Body Pose Estimation via Egocentric video and human motions, that decomposes the problem into two stages, connected by the head motion as an intermediate representation, and integrates SLAM and a learning approach to estimate accurate head motion.

QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars

This work presents a reinforcement learning framework that takes in sparse signals from an HMD and two controllers, and simulates plausible and physically valid full body motions, and shows that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.

HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using Inertial Sensing

HOOV is presented, a wrist-worn sensing method that allows VR users to interact with objects outside their field of view by predicting hand positions and trajectories from just the continuous estimation of hand orientation, which by itself is stable based solely on inertial observations.

CHORE: Contact, Human and Object REconstruction from a single RGB image

This work introduces CHORE, a novel method that learns to jointly reconstruct human and object from a single image that significantly outperforms the SOTA and proposes a simple yet effective depth-aware scaling that allows more efficient shape learning on real data.

Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement

This work introduces a novel bilateral hand disentanglement based two-stage 3D hand generation method and proposes a Prototypical-Memory Sampling Strategy (PSS) to generate the non-deterministic hand gestures by gradient-based Markov Chain Monte Carlo (MCMC) sampling.

COUCH: Towards Controllable Human-Chair Interactions

A novel synthesis framework COUCH is proposed that plans ahead the motion by predicting contact-aware control signals of the hands, which are then used to synthesize contact-conditioned interactions and shows significant quantitative and qualitative improvements over existing methods for human-object interactions.

GFPose: Learning 3D Human Pose Prior with Gradient Fields

GFPose is a versatile framework to model plausible 3D human poses for various applications, which is a time-dependent score network, which estimates the gradient on each body joint and progressively denoises the perturbed3D human pose to match a given task specification.



Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

This work addresses the problem of making human motion capture in the wild more practical by making use of a realistic statistical body model that includes anthropometric constraints and using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames.

Full-Body Motion from a Single Head-Mounted Device: Generating SMPL Poses from Partial Observations

This paper proposes a method based on variational autoencoders to generate articulated poses of a human skeleton based on noisy streams of head and hand pose based on a model of pose likelihood that is novel and theoretically well-grounded.

Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

A novel deep neural network capable of reconstructing human full body pose in real-time from 6 Inertial Measurement Units (IMUs) worn on the user's body using a bi-directional RNN architecture is demonstrated.

TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors

TransPose is presented, a DNNbased approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps and outperforms the state-ofthe-art learning and optimization-based methods with a large margin.

Estimating Egocentric 3D Human Pose in Global Space

To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset.

LoBSTr: Real‐time Lower‐body Pose Prediction from Sparse Upper‐body Tracking Signals

This paper introduces a deep neural network (DNN) based method for real‐time prediction of the lower‐body pose only from the tracking signals of the upper‐body joints, and demonstrates the effectiveness of the method through several quantitative evaluations against other architectures and input representations with respect to wild tracking data obtained from commercial VR devices.

Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors

  • Xinyu YiYuxiao Zhou F. Xu
  • Physics
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
Motion capture from sparse inertial sensors has shown great potential compared to image-based approaches since occlusions do not lead to a reduced tracking quality and the recording space is not

AMASS: Archive of Motion Capture As Surface Shapes

AMASS is introduced, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization and makes it readily useful for animation, visualization, and generating training data for deep learning.

Human upper-body inverse kinematics for increased embodiment in consumer-grade virtual reality

This work presents heuristics for elbow positioning depending on the shoulder-to-hand distance and for avoiding reaching unnatural joint limits and shows that virtual arms animated with the inverse kinematics system can be used for applications involving heavy arm movement.

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

This paper proposes a novel hybrid inverse kinematics solution (HybrIK) that surpasses the state-of-the-art methods by a large margin on various 3D human pose and shape benchmarks and preserves both the accuracy of 3D pose and the realistic body structure of the parametric human model.