• Corpus ID: 235658908

VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects

  title={VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects},
  author={Ruihai Wu and Yan Zhao and Kaichun Mo and Zizheng Guo and Yian Wang and Tianhao Wu and Qingnan Fan and Xuelin Chen and Leonidas J. Guibas and Hao Dong},
Perceiving and manipulating 3D articulated objects ( e.g. , cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. In… 

Figures and Tables from this paper

AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions

A novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors is proposed and it is proved that the system performs better than baselines.

Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery

This work introduces Structure from Action (SfA), a framework that discovers the 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions, and demonstrates that a single SfA model trained in simulation can generalize to many unseen object categories with unknown kinematic structures and to real-world objects.

End-to-End Affordance Learning for Robotic Manipulation

This study takes advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest, which leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks.

Learning Agent-Aware Affordances for Closed-Loop Interaction with Articulated Objects

The concept of agent-aware affordances which fully reflect the agent’s capabilities and embodiment are introduced and it is shown that they outperform their state-of-the-art counterparts which are only conditioned on the end-effector geometry.

Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile Manipulation

This paper proposes a two-stage architecture for autonomous interaction with large articulated objects in unknown environments, and shows that the proposed pipeline can handle complex static and dynamic kitchen settings for both wheelbased and legged mobile manipulators.

H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions

This work proposes H-SAUR, a probabilistic generative framework that simultaneously generates a distribution of hypotheses about how objects articulate given input observations, captures certainty over hypotheses over time, and infer plausible actions for exploration and goal-conditioned manipulation in autonomous agents.

Learning Object Affordance with Contact and Grasp Generation

This paper proposes to formulate the object affordance understanding as both contacts and grasp poses generation and factorizes the learning task into two sequential stages, which outperforms state-of-the-art methods re-garding grasp generation on various metrics.

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

A strong 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques is proposed for cross-category part segmentation and pose estimation, which outperforms all existing methods by a large margin.

CS 6301 Introduction to Robot Manipulation and Navigation Project Proposal Description

For project evaluation, all three categories will be considered equally. A project will be evaluated according its quality in terms of implementation, experiments, presentation and writing,

DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation

This work proposes a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks, to reduce the quadratic problem for two grippers into two disentangled yet inter-connected subtasks for efficient learning.



Learning Robotic Manipulation through Visual Planning and Acting

This work learns to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object, and shows that separating the problem into visual planning and visual tracking control is more efficient and more interpretable than alternative data-driven approaches.

Where2Act: From Pixels to Actions for Articulated 3D Objects

This paper proposes a learning-from-interaction framework with an online data sampling strategy that allows to train the network in simulation (SAPIEN) and generalizes across categories and proposes, discusses, and evaluates novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force.

Visual Identification of Articulated Object Parts

This work proposes FormNet, a neural network that identifies the articulation mechanisms between pairs of object parts from a single frame of an RGB-D image and segmentation masks, and achieves an articulation type classification accuracy of 82.5% on novel object instances in trained categories.

GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes

A generative model is introduced that jointly reasons in all levels and refines the 51-DoF of a 3D hand model that minimize a graspability loss, and can robustly predict realistic grasps, even in cluttered scenes with multiple objects in close contact.

Learning Semantic Keypoint Representations for Door Opening Manipulation

A novel method for opening unseen doors with no prior knowledge of door model is proposed, which leverages semantic 3D keypoints as door handle representations to generate the end-effector trajectory from a motion planner.

Spatial Action Maps for Mobile Manipulation

This work presents "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location.

Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery

Act the Part is introduced to learn how to interact with articulated objects to discover and segment their pieces by coupling action selection and motion segmentation, and is able to isolate structures to make perceptual part recovery possible without semantic labels.

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach

Learning Dexterous Grasping with Object-Centric Visual Affordances

This work proposes an approach for learning dexterous grasping that embeds an object-centric visual affordance model within a deep reinforcement learning loop to learn grasping policies that favor the same object regions favored by people.

Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile Manipulation

This paper proposes a two-stage architecture for autonomous interaction with large articulated objects in unknown environments, and shows that the proposed pipeline can handle complex static and dynamic kitchen settings for both wheelbased and legged mobile manipulators.