Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics

@article{Mees2019Selfsupervised3S,
  title={Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics},
  author={Oier Mees and Maxim Tatarchenko and Thomas Brox and Wolfram Burgard},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2019},
  pages={6083-6089}
}
We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image. During training, our network gets the learning signal from a silhouette of an object in the input image-a form of self-supervision. It does not require ground truth data for 3D shapes and the viewpoints. Because it relies on such a weak form of supervision, our approach can easily be applied to real-world data. We demonstrate that our method produces reasonable qualitative… Expand
Higher Order Function Networks for View Planning and Multi-View Reconstruction
TLDR
This work extends a recent method which uses Higher Order Functions (HOF) to represent the shape of the object and presents a new generalization of this method to incorporate multiple images as input and establish a connection between visibility and reconstruction quality. Expand
From Image Collections to Point Clouds With Self-Supervised Shape and Pose Networks
TLDR
A key novelty of the proposed technique is to impose 3D geometric reasoning into predicted 3D point clouds by rotating them with randomly sampled poses and then enforcing cycle consistency on both 3D reconstructions and poses. Expand
Tackling Two Challenges of 6D Object Pose Estimation: Lack of Real Annotated RGB Images and Scalability to Number of Objects
TLDR
This work addresses the main challenges for 6D object pose estimation and proposes a novel self-supervision method via pose consistency and applies additional parameterisation to a backbone network and distill knowledge from teachers to a student network for model compression. Expand
Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction
TLDR
This work proposes a novel approach for modeling the dynamics of a robot’s interactions directly from unlabeled 3D point clouds and images, which leads to effective, interpretable models that can be used for visuomotor control and planning. Expand
Learning Object Placements For Relational Instructions by Hallucinating Scene Representations
TLDR
This work presents a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image, and demonstrates the effectiveness of the method in reasoning about the best way to place objects to reproduce a spatial relation. Expand
Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video
TLDR
This work proposes a novel approach to learn a task-agnostic skill embedding space from unlabeled multi-view videos by using an adversarial loss, and shows that the learned embedding enables training of continuous control policies to solve novel tasks that require the interpolation of previously seen skills. Expand
Composing Pick-and-Place Tasks By Grounding Language
TLDR
This work presents a robot system that follows unconstrained language instructions to pick and place arbitrary objects and effectively resolves ambiguities through dialogues and demonstrates the effectiveness of the method in understanding pick-and-place language instructions and sequentially composing them to solve tabletop manipulation tasks. Expand
Self-Supervised Euphemism Detection and Identification for Content Moderation
TLDR
This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Expand
Introducing Pose Consistency and Warp-Alignment for Self-Supervised 6D Object Pose Estimation in Color Images
TLDR
A two-stage 6D object pose estimator framework that can be applied on top of existing neural-network-based approaches and that does not require pose annotations on real images is proposed, resulting in state-of-the-art performance when compared to methods trained only on synthetic data, domain adaptation baselines and a concurrent self-supervised approach. Expand
...
1
2
...

References

SHOWING 1-10 OF 36 REFERENCES
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline). Expand
Learning Category-Specific Mesh Reconstruction from Image Collections
TLDR
A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes. Expand
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
TLDR
An encoder-decoder network with a novel projection loss defined by the projective transformation enables the unsupervised learning using 2D observation without explicit 3D supervision and shows superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved. Expand
A Point Set Generation Network for 3D Object Reconstruction from a Single Image
TLDR
This paper addresses the problem of 3D reconstruction from a single image, generating a straight-forward form of output unorthordox, and designs architecture, loss function and learning paradigm that are novel and effective, capable of predicting multiple plausible 3D point clouds from an input image. Expand
Shape completion enabled robotic grasping
TLDR
This work provides an architecture to enable robotic grasp planning via shape completion through the use of a 3D convolutional neural network trained on a new open source dataset of over 440,000 3D exemplars captured from varying viewpoints. Expand
3D Shape Induction from 2D Views of Multiple Objects
TLDR
The approach called "projective generative adversarial networks" (PrGANs) trains a deep generative model of 3D shapes whose projections match the distributions of the input 2D views, which allows it to predict 3D, viewpoint, and generate novel views from an input image in a completely unsupervised manner. Expand
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction
TLDR
This work presents a framework for learning single-view shape and pose prediction without using direct supervision for either, and demonstrates the applicability of the framework in a realistic setting which is beyond the scope of existing techniques. Expand
MarrNet: 3D Shape Reconstruction via 2.5D Sketches
TLDR
This work proposes MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape and derives differentiable projective functions from 3D shape to 2. Expand
3D ShapeNets: A deep representation for volumetric shapes
TLDR
This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and shows that this 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks. Expand
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
TLDR
A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks. Expand
...
1
2
3
4
...