SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis

@article{Ji2017SurfaceNetAE,
  title={SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis},
  author={Mengqi Ji and Juergen Gall and Haitian Zheng and Yebin Liu and Lu Fang},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={2326-2334}
}
  • Mengqi Ji, Juergen Gall, Lu Fang
  • Published 5 August 2017
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision (ICCV)
This paper proposes an end-to-end learning framework for multiview stereopsis. [] Key Method SurfaceNet is a fully 3D convolutional network which is achieved by encoding the camera parameters together with the images in a 3D voxel representation. We evaluate SurfaceNet on the large-scale DTU benchmark.
SMVNet: Deep Learning Architectures for Accurate and Robust Multi-View Stereopsis
TLDR
SMVNet is an end-to-end trainable network, which can reconstruct complex outdoor 3D models and be applied to large-scale datasets in a parallel fashion without the need of estimating or fusing multiple depth maps, typical of other approaches.
SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-View Stereopsis
TLDR
SurfaceNet+, a volumetric method to handle the 'incompleteness' and 'inaccuracy' problems induced by very sparse MVS setup is presented, which demonstrates the tremendous performance gap between SurfaceNet+ and the state-of-the-art methods in terms of precision and recall.
DRI-MVSNet: A depth residual inference network for multi-view stereo images
TLDR
The results of extensive experiments show that DRI-MVSNet delivers competitive performance on the DTU and the Tanks & Temples datasets, and the accuracy and completeness of the point cloud reconstructed by it are significantly superior to those of state-of-the-art benchmarks.
Learning to Reconstruct and Segment 3D Objects
TLDR
This thesis aims to understand scenes and the objects within them by learning general and robust representations using deep neural networks, trained on large-scale real-world 3D data.
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
TLDR
To the best of the knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time and outperforms state-of-the-art methods in terms of both ac-curacy and speed.
MVSCRF: Learning Multi-View Stereo With Conditional Random Fields
TLDR
This work presents a deep-learning architecture for multi-view stereo with conditional random fields (MVSCRF), and achieves comparable results with state-of-the-art learning based methods on outdoor Tanks and Temples dataset without fine-tuning, which demonstrates the method’s generalization ability.
DPSNet: End-to-end Deep Plane Sweep Stereo
TLDR
A convolutional neural network called DPSNet (Deep Plane Sweep Network) whose design is inspired by best practices of traditional geometry-based approaches for dense depth reconstruction, achieves state-of-the-art reconstruction results on a variety of challenging datasets.
Attention-based Multi-View Stereo Network
TLDR
This paper proposes to add a lightweight attention module to the feature pyramids, allowing us inferring high resolution depth maps to achieve better reconstruction results and achieves a more complete point cloud and less noise results on the DTU benchmark.
Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction
TLDR
This work presents a geometry-based end-to-end deep learning framework that first detects the mirror plane of reflection symmetry that commonly exists in man-made objects and then predicts depth maps by finding the intra-image pixel-wise correspondence of the symmetry.
ADIM-MVSNet: Adaptive Depth Interval Multi-View Stereo Network for 3D Reconstruction
TLDR
Experimental results show that the proposed method can effectively conduct the multi-view 3D reconstruction of complex scenes and achieve state-of-the-art (SOTA) reconstruction results on the DTU dataset.
...
...

References

SHOWING 1-10 OF 36 REFERENCES
Multi-view 3D Models from Single Images with a Convolutional Network
TLDR
A convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object and several depth maps fused together give a full point cloud of the object.
Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction
  • S. Galliani, K. Schindler
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
We present a multi-view reconstruction method that combines conventional multi-view stereo (MVS) with appearance-based normal prediction, to obtain dense and accurate 3D surface models. Reliable
VoxNet: A 3D Convolutional Neural Network for real-time object recognition
  • Daniel Maturana, S. Scherer
  • Computer Science
    2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2015
TLDR
VoxNet is proposed, an architecture to tackle the problem of robust object recognition by integrating a volumetric Occupancy Grid representation with a supervised 3D Convolutional Neural Network (3D CNN).
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
  • S. Song, Jianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.
3D ShapeNets: A deep representation for volumetric shapes
TLDR
This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and shows that this 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
Massively Parallel Multiview Stereopsis by Surface Normal Diffusion
TLDR
This work builds on the Patchmatch idea: starting from randomly generated 3D planes in scene space, the best-fitting planes are iteratively propagated and refined to obtain a 3D depth and normal field per view, such that a robust photo-consistency measure over all images is maximized.
Computing the stereo matching cost with a convolutional neural network
  • J. Zbontar, Yann LeCun
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
This work trains a convolutional neural network to predict how well two image patches match and uses it to compute the stereo matching cost, which achieves an error rate of 2.61% on the KITTI stereo dataset.
Deep Learning 3D Shape Surfaces Using Geometry Images
TLDR
This work qualitatively and quantitatively validate that creating geometry images using authalic parametrization on a spherical domain is suitable for robust learning of 3D shape surfaces, and proposes a way to implicitly learn the topology and structure of3D shapes using geometry images encoded with suitable features.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
...
...