On the generalization of learning-based 3D reconstruction

  title={On the generalization of learning-based 3D reconstruction},
  author={Miguel {\'A}ngel Bautista and Walter A. Talbott and Shuangfei Zhai and Nitish Srivastava and Joshua M. Susskind},
  journal={2021 IEEE Winter Conference on Applications of Computer Vision (WACV)},
State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training. In this paper we study the inductive biases encoded in the model architecture that impact the generalization of learning-based 3D reconstruction methods. We find that 3 inductive biases impact performance: the spatial extent of the encoder, the use of the underlying… 

Figures and Tables from this paper

Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors

This work argues that exploiting local priors allows this method to efficiently use input observations, thus improving generalization in visible areas of novel shapes, and shows that the hierarchical approach generalizes much better than the global approach.

FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction

FvOR is a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses using learnable neural network modules and achieves best-in-class results.

3D Reconstruction of Novel Object Shapes from Single Images

This work shows that the proposed SDFNet achieves state-of-the-art performance on seen and unseen shapes relative to existing methods GenRe and OccNet, and provides the first large-scale evaluation of single image shape reconstruction to unseen objects.

ZeroMesh: Zero-shot Single-view 3D Mesh Reconstruction

An end-to-end two-stage network, ZeroMesh, is proposed to break the category boundaries in reconstruction of single-view 3D Mesh Reconstruction, and outperforms the existing works on the ShapeNet and Pix3D under different scenarios and various metrics, especially for novel objects.

Three-Dimensional Reconstruction from a Single RGB Image Using Deep Learning: A Review

This paper reviews different approaches for reconstructing 3D shapes as depth maps, surface normals, point clouds, and meshes; along with various loss functions and metrics used to train and evaluate these methods.

HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction

This report presents a photo-realistic object-centric dataset HM3D-ABO, constructed by composing realistic indoor scene and realistic object and providing multi-view RGB observa-tions, a water-tight mesh model for the object, ground truth depth map and object mask.

A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks

This work hypothesizes that NNs are biased toward recognition when training images are more dispersed and training shapes are less dispersed, and introduces the dispersion score, a new data-driven metric, to quantify this leading factor and study its effect on NNs.

Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

By predicting a series of occupancy probabilities along a ray that is back-projected from a pixel in the camera coordinate, the method Ray-ONet improves the reconstruction accuracy in comparison with Occupancy Networks (ONet), while reducing the network inference complexity to O( N 2 ).

Relative Pose Estimation for RGB-D Human Input Scans via Implicit Function Reconstruction

A novel end-to-end and coarse- to-fine optimization method which firstly combines implicit function reconstruction with differentiable render for RGB-D human input scans at arbitrary overlaps in relative pose estimation and outperforms considerably than standard pipelines in non-overlapping setups.

pixelNeRF: Neural Radiance Fields from One or Few Images

We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fields



Learning a Multi-View Stereo Machine

End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images than required by classical approaches as well as completion of unseen surfaces.

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

Learning Category-Specific Mesh Reconstruction from Image Collections

A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes.

Occupancy Networks: Learning 3D Reconstruction in Function Space

This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input.

What Do Single-View 3D Reconstruction Networks Learn?

This work sets up two alternative approaches that perform image classification and retrieval respectively and shows that encoder-decoder methods are statistically indistinguishable from these baselines, indicating that the current state of the art in single-view object reconstruction does not actually perform reconstruction but image classification.

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

This paper addresses the problem of 3D reconstruction from a single image, generating a straight-forward form of output unorthordox, and designs architecture, loss function and learning paradigm that are novel and effective, capable of predicting multiple plausible 3D point clouds from an input image.

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

This work proposes a robust loss formulation that enforces first order consistency and for each point, selectively enforces consistency with some views, thus implicitly handling occlusions, and allows adaptation of existing CNNs to datasets without ground-truth 3D by unsupervised finetuning.

Learning to Reconstruct Shapes from Unseen Classes

This work presents an algorithm, Generalizable Reconstruction (GenRe), designed to capture more generic, class-agnostic shape priors, with an inference network and training procedure that combine 2.5D representations of visible surfaces, spherical shape representations of both visible and non-visible surfaces, and 3D voxel-based representations.

Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

This work proposes a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision and is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster.

Unsupervised Learning of Shape and Pose with Differentiable Point Clouds

This work trains a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error, and introduces an ensemble of pose predictors which are distill to a single "student" model.