Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery

@article{Gadre2021ActTP,
  title={Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery},
  author={Samir Yitzhak Gadre and Kiana Ehsani and Shuran Song},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={15732-15741}
}
People often use physical intuition when manipulating articulated objects, irrespective of object semantics. Motivated by this observation, we identify an important embodied task where an agent must play with objects to recover their parts. To this end, we introduce Act the Part (AtP) to learn how to interact with articulated objects to discover and segment their pieces. By coupling action selection and motion segmentation, AtP is able to isolate structures to make perceptual part recovery… 
L EARNING C ATEGORY -L EVEL G ENERALIZABLE O BJECT M ANIPULATION P OLICY VIA G ENERATIVE A DVERSARIAL S ELF -I MITATION L EARNING FROM D EMONSTRATIONS
  • Computer Science
  • 2022
TLDR
This work identifies several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances, and proposes several general but critical techniques that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks.
Continuous Scene Representations for Embodied AI
TLDR
This work proposes Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings, to embed pair-wise relationships between objects in a latent space.
Learning Category-Level Generalizable Object Manipulation Policy via Generative Adversarial Self-Imitation Learning from Demonstrations
TLDR
This work identifies several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances, and proposes several general but critical techniques that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks.
Ditto: Building Digital Twins of Articulated Objects from Interaction
TLDR
This work introduces Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception, and shows that Ditto effectively builds digital twins of articulated objects in a category-agnostic way.
UMPNet: Universal Manipulation Policy Network for Articulated Objects
TLDR
A novel Arrow-of-Time action attribute is introduced that indicates whether an action will change the object state back to the past or forward into the future, enabling both effective state exploration and goal-conditioned manipulation.
Universal Manipulation Policy Network for Articulated Objects
TLDR
A novel Arrow-of-Time action attribute is introduced that indicates whether an action will change the object state back to the past or forward into the future, enabling both effective state exploration and goal-conditioned manipulation.
IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes
TLDR
This paper takes the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes.
Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models
TLDR
The Soft Actor- Critic Gaussian Mixture Model (SAC-GMM), a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space through interactions with the environment, is proposed.
Unsupervised Part Discovery from Contrastive Reconstruction
TLDR
It is suggested that image reconstruction at the level of pixels can alleviate this problem, acting as a complementary cue and the standard evaluation based on keypoint regression does not correlate well with segmentation quality and thus different metrics are introduced that better characterize the decomposition of objects into parts.
...
...

References

SHOWING 1-10 OF 60 REFERENCES
SAPIEN: A SimulAted Part-Based Interactive ENvironment
  • Fanbo Xiang, Yuzhe Qin, Hao Su
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored.
PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding
  • Kaichun Mo, Shilin Zhu, Hao Su
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
This work presents PartNet, a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information, and proposes a baseline method for part instance segmentation that is superior performance over existing methods.
ShapeNet: An Information-Rich 3D Model Repository
TLDR
ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy, a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations.
MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
TLDR
This work presents MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds that incorporates spectral synchronization into an iterative deep declarative network so as to simultaneously recover consistent correspondences as well asmotion segmentation.
Where2Act: From Pixels to Actions for Articulated 3D Objects
TLDR
This paper proposes a learning-from-interaction framework with an online data sampling strategy that allows to train the network in simulation (SAPIEN) and generalizes across categories and proposes, discusses, and evaluates novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force.
Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation
TLDR
An unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori, and then adapt it to the task of category-independent articulated object pose estimation, which achieves higher accuracy than the state-of-the-art supervised methods.
3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators
TLDR
An action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos, outperforming existing 2D and 3D dynamics models.
Learning 3D Dynamic Scene Representations for Robot Manipulation
TLDR
This paper introduces 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties, and proposes DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR.
Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction
TLDR
This work proposes a novel approach for modeling the dynamics of a robot’s interactions directly from unlabeled 3D point clouds and images, which leads to effective, interpretable models that can be used for visuomotor control and planning.
Learning About Objects by Learning to Interact with Them
TLDR
This work presents a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction, and reveals that this agent learns efficiently and effectively; not just for objects it has interacted with before, but also for novel instances from seen categories as well as novel object categories.
...
...