Learning Task-Oriented Grasping From Human Activity Datasets

@article{Kokic2020LearningTG,
  title={Learning Task-Oriented Grasping From Human Activity Datasets},
  author={Mia Kokic and Danica Kragic and Jeannette Bohg},
  journal={IEEE Robotics and Automation Letters},
  year={2020},
  volume={5},
  pages={3352-3359}
}
We propose to leverage a real-world, human activity RGB dataset to teach a robot Task-Oriented Grasping (TOG). We develop a model that takes as input an RGB image and outputs a hand pose and configuration as well as an object pose and a shape. We follow the insight that jointly estimating hand and object poses increases accuracy compared to estimating these quantities independently of each other. Given the trained model, we process an RGB dataset to automatically obtain the data to train a TOG… Expand

Figures from this paper

GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes
TLDR
A generative model is introduced that jointly reasons in all levels and refines the 51-DoF of a 3D hand model that minimize a graspability loss, and can robustly predict realistic grasps, even in cluttered scenes with multiple objects in close contact. Expand
GRAB: A Dataset of Whole-Body Human Grasping of Objects
TLDR
This work collects a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size, and trains GrabNet, a conditional generative network, to predict 3D handgrasps for unseen 3D object shapes. Expand
Dexterous Robotic Grasping with Object-Centric Visual Affordances
TLDR
The key idea is to embed an object-centric visual affordance model within a deep reinforcement learning loop to learn grasping policies that favor the same object regions favored by people. Expand
Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping
TLDR
The GCNGrasp framework is presented which uses the semantic knowledge of objects and tasks encoded in a knowledge graph to generalize to new object instances, classes and even new tasks. Expand
Robust Task-Based Grasping as a Service
TLDR
An intuitive user interface is designed that takes an object mesh as input and displays it, allowing non-specialists to indicate “stay-out” zones by painting facets of the mesh and to indicate desired forces and torques by drawing vectors. Expand
Improving Dynamic Bounding Box using Skeleton Keypoints for Hand Pose Estimation
Human-Computer Interaction (HCI) studies the design and implementation of computer technology, focused on the interaction between users and machines or computers. One of human-computer interface isExpand
Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks
TLDR
This work introduces MAnipulation Primitive-augmented reinforcement LEarning (MAPLE), a learning framework that augments standard reinforcement learning algorithms with a pre-defined library of behavior primitives, robust functional modules specialized in achieving manipulation goals, such as grasping and pushing. Expand
CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation
TLDR
A novel, object-centric canonical representation at the category level is proposed, which allows establishing dense correspondence across object instances and transferring task-relevant grasps to novel instances. Expand
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video
  • 2021
Dexterous multi-fingered robotic hands have a formidable action space, 1 yet their morphological similarity to the human hand holds immense potential to 2 accelerate robot learning. We proposeExpand
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning
TLDR
This paper proposes a unified affordance learning framework to learn object-object interaction for various tasks using physical simulation (SAPIEN), ShapeNet models with rich geometric diversity, and an object-kernel point convolution network to reason about detailed interaction between two objects. Expand
...
1
2
...

References

SHOWING 1-10 OF 34 REFERENCES
Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images
TLDR
A Convolutional Neural Network for Hand-held Object Pose and Shape estimation called HOPS-Net is designed and utilize prior work to estimate the hand pose and configuration and an image-to-image translation model that generates realistically textured objects given a synthetic rendering is employed. Expand
Understanding Everyday Hands in Action from RGB-D Images
TLDR
A large dataset of 12,000 RGB-D images covering 71 everyday grasps in natural interactions allowing for exploration of contact and force prediction from perceptual cues and illustrating the role of segmentation, object context, and 3D-understanding in functional grasp analysis. Expand
Unified Egocentric Recognition of 3 D Hand-Object Poses and Interactions
We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras. Given a single RGB image, our model jointly estimates the 3D handExpand
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions
TLDR
A single architecture is proposed that does not rely on external detection algorithms but rather is trained end-to-end on single images and further merge and propagate information in the temporal domain to infer interactions between hand and object trajectories and recognize actions. Expand
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations
TLDR
This work collects RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations, and sees clear benefits of using hand pose as a cue for action recognition compared to other data modalities. Expand
Affordance detection for task-specific grasping using deep learning
TLDR
The notion of affordances to model relations between task, object and a grasp to address the problem of task-specific robotic grasping is utilized and the feasibility of this approach is demonstrated by employing an optimization-based grasp planner to compute task- specific grasps. Expand
Learning Joint Reconstruction of Hands and Manipulated Objects
TLDR
This work presents an end-to-end learnable model that exploits a novel contact loss that favors phys- ically plausible hand-object constellations, and improves grasp quality metrics over baselines, using RGB images as input. Expand
Task-oriented grasping with semantic and geometric scene understanding
TLDR
A key element of this work is to use a deep network to integrate contextual task cues, and defer the structured-output problem of gripper pose computation to an explicit (learned) geometric model. Expand
Global Search with Bernoulli Alternation Kernel for Task-oriented Grasping Informed by Simulation
TLDR
A variant of Bayesian optimization that alternates between using informed and uninformed kernels is proposed and a neural network architecture and training pipeline that use experience from grasping objects in simulation to learn grasp stability scores is proposed. Expand
Non-parametric hand pose estimation with object context
TLDR
Experiments show the non-parametric method for estimating the pose of human hands to outperform other state of the art regression methods, while operating at a significantly lower computational cost than comparable model-based hand tracking methods. Expand
...
1
2
3
4
...