MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

  title={MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion},
  author={Kentaro Wada and Edgar Sucar and Stephen James and Daniel Lenton and Andrew J. Davison},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from… 

Towards Two-view 6D Object Pose Estimation: A Comparative Study on Fusion Strategy

This paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images, and concludes the Mid-Fusion approach is the best approach to restore the most precise 3D keypoints useful forobject pose estimation.

6DoF Pose Estimation of Transparent Object from a Single RGB-D Image

Experimental results show that the proposed approach can effectively estimate 6DoF pose of transparent object, and it out-performs the state-of-the-art baselines by a large margin.

Scalable, physics-aware 6D pose estimation for robot manipulation

Algorithms to generate and validate object poses online based on the objects’ geometry andbased on the physical consistency of their scene-level interactions are proposed, providing robustness even when there exists a domain gap between the synthetic training and the real test scenarios.

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

A 3D geometric volume based pose estimation method with a short baseline two-view setting that outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes, especially in serious occlusion situations.

Multi-view Fusion for Multi-level Robotic Scene Understanding

By developing and fusing recent techniques in these domains, this work provides a rich scene representation for robot awareness and demonstrates the importance of each of these modules, their complementary nature, and the potential benefits of the system in the context of robotic manipulation.

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization

A framework based on a recurrent neural network for object pose refinement, which is robust to erroneous initial poses and occlusion, and introduces a consistency-check mechanism based on the learned descriptors of the 3D model and observed 2D images, which downweights the unreliable correspondences during pose optimization.

ReorientBot: Learning Object Reorientation for Specific-Posed Placement

This work presents a vision-based manipulation system, ReorientBot, which consists of visual scene understanding with pose estimation and volumetric reconstruction using an onboard RGB-D camera, learned waypoint selection for successful and efficient motion generation for reorientation, and traditional motion planning to generate a collision-free trajectory from the selected waypoints.

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From Monocular RGB Image

This paper proposes to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into object- level depth and the canonical NOCS representation and solves the 6D object pose problem by aligning the predicted canonical representation with the back-projected object-levels depth.

Generating Annotated Training Data for 6D Object Pose Estimation in Operational Environments with Minimal User Interaction

This work presents a proof of concept for a novel approach of autonomously generating annotated training data for 6D object pose estimation and evaluates the autonomous data generation approach in two grasping experiments, where a similar grasping success rate as related work on a non autonomously generated data set is evaluated.

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

This work extends a recent RL-based registration approach towards iterative refinement of object poses and shows that considering plausibility reduces ambiguity and allows poses to be more accurately predicted in cluttered environments.



DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

DenseFusion is a generic framework for estimating 6D pose of a set of known objects from RGB-D images that processes the two data sources individually and uses a novel dense fusion network to extract pixel-wise dense feature embedding, from which the pose is estimated.

MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM

This system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks and demonstrates its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.

Fusion++: Volumetric Object-Level SLAM

An online object-level SLAM system which builds a persistent and accurate 3D graph map of arbitrary reconstructed objects is proposed, and performance evaluation shows the approach is highly memory efficient and runs online at 4-8Hz despite not being optimised at the software level.

Learning 6D Object Pose Estimation Using 3D Object Coordinates

This work addresses the problem of estimating the 6D Pose of specific objects from a single RGB-D image by presenting a learned, intermediate representation in form of a dense 3D object coordinate labelling paired with a dense class labelling.

Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes

A framework for automatic modeling, detection, and tracking of 3D objects with a Kinect and shows how to build the templates automatically from 3D models, and how to estimate the 6 degrees-of-freedom pose accurately and in real-time.

The MOPED framework: Object recognition and pose estimation for manipulation

We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized,

Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring

Meaningful maps with object-oriented semantic mapping

This paper simultaneously build geometric point cloud models of previously unseen instances of known object classes and create a map that contains these object models as central entities that leverages sparse, feature-based RGB-D SLAM, image-based deep-learning object detection and 3D unsupervised segmentation.

PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation

This work evaluates PointFusion on two distinctive datasets: the KITTI dataset that features driving scenes captured with a lidar-camera setup, and the SUN-RGBD dataset that captures indoor environments with RGB-D cameras.

The YCB object and Model set: Towards common benchmarks for manipulation research

The Yale-CMU-Berkeley (YCB) Object and Model set is intended to be used for benchmarking in robotic grasping and manipulation research, and provides high-resolution RGBD scans, physical properties and geometric models of the objects for easy incorporation into manipulation and planning software platforms.