Graph-based Cluttered Scene Generation and Interactive Exploration using Deep Reinforcement Learning

  title={Graph-based Cluttered Scene Generation and Interactive Exploration using Deep Reinforcement Learning},
  author={K. Niranjan Kumar and Irfan Essa and Sehoon Ha},
  journal={2022 International Conference on Robotics and Automation (ICRA)},
We introduce a novel method to teach a robotic agent to interactively explore cluttered yet structured scenes, such as kitchen pantries and grocery shelves, by leveraging the physical plausibility of the scene. We propose a novel learning framework to train an effective scene exploration policy to discover hidden objects with minimal interactions. First, we define a novel scene grammar to represent structured clutter. Then we train a Graph Neural Network (GNN) based Scene Generation agent using… 

Figures and Tables from this paper

Cascaded Compositional Residual Learning for Complex Interactive Behaviors

This work presents a novel frame- work, Cascaded Compositional Residual Learning (CCRL), which learns composite skills by recursively leveraging a library of previously learned control policies, and shows that this framework learns joint-level control policies for a diverse set of motor skills ranging from basic locomotion to complex interactive navigation.

Safe, Occlusion-Aware Manipulation for Online Object Reconstruction in Confined Spaces

This work formulates the general, occlusion-aware manipulation task, and focuses on safe object reconstruction in a confined space with in-place relocation, and proposes a framework that ensures safety with completeness guarantees.

Review of Learning-Based Robotic Manipulation in Cluttered Environments

This review divides deep RL-based robotic manipulation tasks in cluttered environments into three categories, namely, object removal, assembly and rearrangement, and object retrieval and singulation tasks, and discusses the challenges and potential prospects of object manipulation in clutter.

Mechanical Search on Shelves with Efficient Stacking and Destacking of Objects

, Abstract. Stacking increases storage efficiency in shelves, but the lack of visibility and accessibility makes the mechanical search problem of revealing and extracting target objects difficult for



SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes

This work proposes a neural network to learn a generative model for sampling consistent indoor scene layouts that learns the co-occurrences, and appearance parameters such as shape and pose, for different objects categories through a grammar-based auto-encoder, resulting in a compact and accurate representation for scene layouts.

Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

Meta-Sim aimed at automatically tuning parameters given a target collection of real images in an unsupervised way and uses Reinforcement Learning to train the model, and design a feature space divergence between the authors' synthesized and target images that is key to successful training.

SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation

A neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings by weighting messages through an attention mechanism, which significantly outperforms state-of-the-art approaches in terms of correctly predicting objects missing in a scene.

Object Finding in Cluttered Scenes Using Interactive Perception

This work proposes a reinforcement learning based active and interactive perception system for scene exploration and object search that transfers smoothly to reality and can solve the object finding task efficiently and with more than 88% success rate.

Object Rearrangement Using Learned Implicit Collision Functions

A learned collision model is proposed that accepts scene and query object point clouds and predicts collisions for 6DOF object poses within the scene and outperforms both traditional pipelines and learned ablations.

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with the perfect per-pixel ground truth. An attributed spatial

A Deep Learning Approach to Grasping the Invisible

The target-oriented motion critic, which maps both visual observations and target information to the expected future rewards of pushing and grasping motion primitives, is learned via deep Q-learning and the motion critic and the classifier are trained in a self-supervised manner through robot-environment interactions.

Transporter Networks: Rearranging the Visual World for Robotic Manipulation

The Transporter Network is proposed, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions and learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses.

Symbolic Relational Deep Reinforcement Learning based on Graph Neural Networks

A novel deep reinforcement learning framework, based on graph neural networks, that can be applied to any relational problem with existing symbolic-relational representation, and shows how to represent relational states with arbitrary goals, multi-parameter actions and concurrent actions.

Goal-directed robot manipulation through axiomatic scene estimation

Generative approaches to scene inference of the robot’s environment as a scene graph are proposed and axioms amenable to goal-directed manipulation through symbolic inference for task planning and collision-free motion planning and execution are proposed.