Graph-based Cluttered Scene Generation and Interactive Exploration using Deep Reinforcement Learning

  title={Graph-based Cluttered Scene Generation and Interactive Exploration using Deep Reinforcement Learning},
  author={K. Niranjan Kumar and Irfan Essa and Sehoon Ha},
  journal={2022 International Conference on Robotics and Automation (ICRA)},
We introduce a novel method to teach a robotic agent to interactively explore cluttered yet structured scenes, such as kitchen pantries and grocery shelves, by leveraging the physical plausibility of the scene. We propose a novel learning framework to train an effective scene exploration policy to discover hidden objects with minimal interactions. First, we define a novel scene grammar to represent structured clutter. Then we train a Graph Neural Network (GNN) based Scene Generation agent using… 

Figures and Tables from this paper

Safe, Occlusion-Aware Manipulation for Online Object Reconstruction in Confined Spaces

This work formulates the general, occlusion-aware manipulation task, and focuses on safe object reconstruction in a confined space with in-place relocation, and proposes a framework that ensures safety with completeness guarantees.

Review of Learning-Based Robotic Manipulation in Cluttered Environments

This review divides deep RL-based robotic manipulation tasks in cluttered environments into three categories, namely, object removal, assembly and rearrangement, and object retrieval and singulation tasks, and discusses the challenges and potential prospects of object manipulation in clutter.

Mechanical Search on Shelves with Efficient Stacking and Destacking of Objects

, Abstract. Stacking increases storage efficiency in shelves, but the lack of visibility and accessibility makes the mechanical search problem of revealing and extracting target objects difficult for



Learning to Singulate Objects using a Push Proposal Network

A novel neural network-based approach that separates unknown objects in clutter by selecting favourable push actions is presented, trained from data collected through autonomous interaction of a PR2 robot with randomly organized tabletop scenes.

SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes

This work proposes a neural network to learn a generative model for sampling consistent indoor scene layouts that learns the co-occurrences, and appearance parameters such as shape and pose, for different objects categories through a grammar-based auto-encoder, resulting in a compact and accurate representation for scene layouts.

Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

Meta-Sim aimed at automatically tuning parameters given a target collection of real images in an unsupervised way and uses Reinforcement Learning to train the model, and design a feature space divergence between the authors' synthesized and target images that is key to successful training.

Split Deep Q-Learning for Robust Object Singulation*

This paper proposes a pushing policy aiming at singulating the target object from its surrounding clutter, by means of lateral pushing movements of both the neighboring objects and thetarget object until sufficient ’grasping room’ has been achieved.

SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation

A neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings by weighting messages through an attention mechanism, which significantly outperforms state-of-the-art approaches in terms of correctly predicting objects missing in a scene.

Object Finding in Cluttered Scenes Using Interactive Perception

This work proposes a reinforcement learning based active and interactive perception system for scene exploration and object search that transfers smoothly to reality and can solve the object finding task efficiently and with more than 88% success rate.

DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

DensePhysNet is proposed, a system that actively executes a sequence of dynamic interactions, and uses a deep predictive model over its visual observations to learn dense, pixel-wise representations that reflect the physical properties of observed objects.

Object Rearrangement Using Learned Implicit Collision Functions

A learned collision model is proposed that accepts scene and query object point clouds and predicts collisions for 6DOF object poses within the scene and outperforms both traditional pipelines and learned ablations.

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with the perfect per-pixel ground truth. An attributed spatial

A Deep Learning Approach to Grasping the Invisible

The target-oriented motion critic, which maps both visual observations and target information to the expected future rewards of pushing and grasping motion primitives, is learned via deep Q-learning and the motion critic and the classifier are trained in a self-supervised manner through robot-environment interactions.