Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

  title={Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views},
  author={Hao Su and C. Qi and Yangyan Li and Leonidas J. Guibas},
  journal={2015 IEEE International Conference on Computer Vision (ICCV)},
  • Hao Su, C. Qi, L. Guibas
  • Published 21 May 2015
  • Computer Science
  • 2015 IEEE International Conference on Computer Vision (ICCV)
Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining render-based image synthesis and CNNs (Convolutional Neural Networks). We believe that 3D models have the potential in generating a large number of images of high variation… 

Figures and Tables from this paper

Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views

This paper presents an end-to-end convolutional neural network (CNN) for 2D-3D exemplar detection. We demonstrate that the ability to adapt the features of natural images to better align with those

Viewpoint Estimation for Objects with Convolutional Neural Network Trained on Synthetic Images

A method to estimate object viewpoint from a single RGB image and address two problems in estimation: generating training data with viewpoint annotations and extracting powerful features for the estimation is proposed.

Synthesizing Training Images for Boosting Human 3D Pose Estimation

It is shown that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data and CNNs trained with the authors' synthetic images out-perform those trained with real photos on 3D pose estimation tasks.

Deep Single-View 3D Object Reconstruction with Visual Hull Embedding

The key idea of the method is to leverage object mask and pose estimation from CNNs to assist the 3D shape learning by constructing a probabilistic singleview visual hull inside of the network.

3 D Pose Regression using Convolutional Neural Networks

This work focuses on two recent state-of-theart approaches based on Convolutional Neural Networks (CNNs): Viewpoints&Keypoints (V&K) and Renderfor-CNN (Render) [8].

Towards Pose Estimation of 3 D Objects in Monocular Images via Keypoint Detection

This work explores how 3D models can be used to generate lots of training images and annotations in the form of keypoint locations and proposes to use CNNs to first detect keypoints in rendered images.

Learning Camera Viewpoint Using CNN to Improve 3D Body Pose Estimation

For the first time, it is shown that camera viewpoint in combination to 2D joint locations significantly improves 3D pose accuracy without the explicit use of perspective geometry mathematical models.

SynPo-Net—Accurate and Fast CNN-Based 6DoF Object Pose Estimation Using Synthetic Training

The proposed SynPo-Net is a network architecture specifically designed for pose regression and a proposed domain adaptation scheme transforming real and synthetic images into an intermediate domain that is better fit for establishing correspondences.

Photorealistic Image Synthesis for Object Instance Detection

An approach to synthesize highly photorealistic images of 3D object models, which is used to train a convolutional neural network for detecting the objects in real images, and is a step towards being able to effectively train object detectors without capturing or annotating any real images.

Crafting a multi-task CNN for viewpoint estimation

This paper presents a comparison of CNN approaches in a unified setting as well as a detailed analysis of the key factors that impact perfor- mance, and presents a new joint training method with the detection task and demonstrates its benefit.



Inferring 3D Object Pose in RGB-D Images

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library by first detecting and segmenting object instances in the scene using the approach from Gupta et al.

Learning Deep Object Detectors from 3D Models

This work shows that augmenting the training data of contemporary Deep Convolutional Neural Net (DCNN) models with such synthetic data can be effective, especially when real training data is limited or not well matched to the target domain.

Exploring Invariances in Deep Convolutional Neural Networks Using Synthetic Images

This work uses synthetic images to probe DCNN invariance to object-class variations caused by 3D shape, pose, and photorealism, and shows that DCNNs used as a fixed representation exhibit a large amount of invariances to these factors, but, if allowed to adapt, can still learn effectively from synthetic data.

Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories

This work proposes a new 3D object class model that is capable of recognizing unseen views by pose estimation and synthesis and performs superiorly to and on par with state-of-the-art algorithms on the Savarese et al. 2007 and PASCAL datasets in object detection.

Viewpoints and keypoints

The problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details is characterized and it is demonstrated that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction

This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and naturally supports object recognition from 2.5D depth map and also view planning for object recognition.

Convolutional Neural Networks for joint object detection and pose estimation: A comparative study

It is shown that a classification approach on discretized viewpoints achieves state-of-the-art performance for joint object detection and pose estimation, and significantly outperforms existing baselines on this benchmark.

Beyond PASCAL: A benchmark for 3D object detection in the wild

PASCAL3D+ dataset is contributed, which is a novel and challenging dataset for 3D object detection and pose estimation, and on average there are more than 3,000 object instances per category.

Joint embeddings of shapes and images via CNN image purification

A joint embedding space populated by both 3D shapes and 2D images of objects, where the distances between embedded entities reflect similarity between the underlying objects, which facilitates comparison between entities of either form, and allows for cross-modality retrieval.