Real-Time Seamless Single Shot 6D Object Pose Prediction

  title={Real-Time Seamless Single Shot 6D Object Pose Prediction},
  author={Bugra Tekin and Sudipta N. Sinha and Pascal V. Fua},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. [] Key Method The key component of our method is a new CNN architecture inspired by [27, 28] that directly predicts the 2D image locations of the projected vertices of the object's 3D bounding box. The object's 6D pose is then estimated using a PnP algorithm. For single object and multiple object pose estimation on the…

Figures and Tables from this paper

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework

A novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects, to represent a 3D model with a number of 2D templates rendered from different viewpoints that enables CNN-based direct dense feature extraction and matching.

Real-Time 6D Object Pose Estimation on CPU

A fast and accurate 6D object pose estimation from a RGB-D image with higher accuracy and faster speed in comparison with state-of-the-art techniques including recent CNN based approaches.

Single Shot 6D Object Pose Estimation

This paper introduces a novel single shot approach for 6D object pose estimation of rigid objects based on depth images, where the 3D input data is spatially discretized and pose estimation is considered as a regression task that is solved locally on the resulting volume elements.

LieNet: Real-time Monocular Object Instance 6D Pose Estimation

In this work, we present, LieNet, a novel deep learning framework that simultaneously detects, segments multiple object instances, and estimates their 6D poses from a single RGB image without

Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image

An end-toend deep learning framework that jointly detects, segments, and most importantly recovers 6D poses of object instances from a single RGB image, and is considerably faster than competing multi-stage methods, offers an inference speed of 10 fps that is well suited for robotic applications.

Semantic keypoint-based pose estimation from single RGB frames

An approach to estimating the continuous 6-DoF pose of an object from a single RGB image that combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model and is agnostic to whether the object is textured or textureless.

Object Pose Estimation using Mid-level Visual Representations

This work proposes a novel pose estimation model for object categories that can be effectively transferred to previously unseen environments and shows that the approach is favorable when it comes to generalization and transfer to novel environments.

A Pose Proposal and Refinement Network for Better 6D Object Pose Estimation

Experiments on three benchmarks for 6D pose estimation show that the proposed pipeline outperforms state-of-the-art RGB-based methods with competitive runtime performance.

CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

This paper presents a simple one- stage approach to predict both the 3D shape and estimate the 6D pose and size jointly in a bounding-box free manner and significantly outperforms all shape completion and categorical 6D poses and size estimation baselines on multi-object ShapeNet and NOCS datasets respectively.



Fast Single Shot Detection and Pose Estimation

This is the first attempt to combine detection and pose estimation at the same level using a deep learning approach and is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

  • Mahdi RadV. Lepetit
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
A novel method for 3D object detection and pose estimation from color images only that uses segmentation to detect the objects of interest in 2D even in presence of partial occlusions and cluttered background and is the first to report results on the Occlusion dataset using color imagesonly.

3 D Pose Regression using Convolutional Neural Networks

This work focuses on two recent state-of-theart approaches based on Convolutional Neural Networks (CNNs): Viewpoints&Keypoints (V&K) and Renderfor-CNN (Render) [8].

Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

This paper starts from the template-based approach based on the LINE2D/LINEMOD representation, yet extends it in two ways to learn the templates in a discriminative fashion, and proposes a scheme based on cascades that speeds up detection.

SSD: Single Shot MultiBox Detector

The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

Global Hypothesis Generation for 6D Object Pose Estimation

This paper addresses the task of estimating the 6D-pose of a known 3D object from a single RGB-D image with a novel fully-connected Conditional Random Field that outputs a very small number of pose-hypotheses and gives a new, efficient two-step optimization procedure.

PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization

This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems.

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image

A regularized, auto-context regression framework is developed which iteratively reduces uncertainty in object coordinate and object label predictions and an efficient way to marginalize object coordinate distributions over depth is introduced to deal with missing depth information.

Combined Holistic and Local Patches for Recovering 6D Object Pose

  • Q. CaoHaoruo Zhang
  • Computer Science
    2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
  • 2017
A novel method for recovering 6D object pose in RGB-D images that combines holistic patches and local patches together to fulfil this task and has high precision and good performance under foreground occlusion and background clutter conditions.

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

This work introduces PoseCNN, a new Convolutional Neural Network for 6D object pose estimation, which is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input.