• Corpus ID: 245334701

SAGA: Stochastic Whole-Body Grasping with Contact

  title={SAGA: Stochastic Whole-Body Grasping with Contact},
  author={Y. Wu and Jiahao Wang and Yan Zhang and Siwei Zhang and Otmar Hilliges and Fisher Yu and Siyu Tang},
. The synthesis of human grasping has numerous applications including AR/VR, video games and robotics. While methods have been proposed to generate realistic hand–object interaction for object grasping and manipulation, these typically only consider interacting hand alone. Our goal is to synthesize whole-body grasping motions . Starting from an arbitrary initial pose, we aim to generate diverse and natural whole-body human motions to approach and grasp a target object in 3D space. This task is… 

Figures and Tables from this paper



Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

This work uses the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild, and evaluates 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth.

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

A hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set and proposes novel set learning layers to adaptively combine features from multiple scales to learn deep point set features efficiently and robustly.

Learning Motion Priors for 4D Human Body Capture in 3D Scenes

This work introduces a novel motion smoothness prior, which strongly reduces the jitters exhibited by poses recovered over a sequence, and designs a contact friction term and a contact-aware motion infiller obtained via per-instance self-supervised training.

GRAB: A Dataset of Whole-Body Human Grasping of Objects

This work collects a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size, and trains GrabNet, a conditional generative network, to predict 3D handgrasps for unseen 3D object shapes.

AMASS: Archive of Motion Capture As Surface Shapes

AMASS is introduced, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization and makes it readily useful for animation, visualization, and generating training data for deep learning.

We are More than Our Joints: Predicting how 3D Bodies Move

MOJO (More than The authors' JOints), which is a novel variational autoencoder with a latent DCT space that generates motions from latent frequencies, is trained, which preserves the full temporal resolution of the input motion, and sampling from the latent frequencies explicitly introduces high-frequency components into the generated motion.

HOnnotate: A Method for 3D Annotation of Hand and Object Poses

O-3D is created, the first markerless dataset of color images with 3D annotations for both the hand and object, and a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions is developed.

Convolutional Autoencoders for Human Motion Infilling

It is shown that a single model can be used to create natural transitions between different types of activities and be able to complete gaps where partial poses are available and an arbitrary number of gaps that potentially vary in length.

Grasping Field: Learning Implicit Representations for Human Grasps

The generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud and achieves comparable performance for 3D hand reconstruction compared to state-of-the-art methods.