VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction

@article{Choe2021VolumeFusionDD,
  title={VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction},
  author={Jaesung Choe and Sunghoon Im and François Rameau and Minjun Kang and In-So Kweon},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={16066-16075}
}
To reconstruct a 3D scene from a set of calibrated views, traditional multi-view stereo techniques rely on two distinct stages: local depth maps computation and global depth maps fusion. Recent studies concentrate on deep neural architectures for depth estimation by using conventional depth fusion method or direct 3D reconstruction network by regressing Truncated Signed Distance Function (TSDF). In this paper, we advocate that replicating the traditional two stages framework with deep neural… 
PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction
TLDR
Experiments show that the unsupervised method can decrease the matching ambiguity and particularly improve the completeness of weakly-textured reconstruction and reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D.
Multi-sensor large-scale dataset for multi-view 3D reconstruction
We present a new multi-sensor dataset for 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense,
High-quality Voxel Reconstruction from Stereoscopic Images
TLDR
The preliminary results show an 80% of coincidence with the original models in 2 categories using the Intersection over Union metric, which indicates that good reconstructions can be made from a small dataset, and will reduce the time and memory usage for this task.
Learning Online Multi-Sensor Depth Fusion
TLDR
SenFuNet is introduced, a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion and outperforms traditional and recent online depth fusion approaches.
Modern Augmented Reality: Applications, Trends, and Future Directions
TLDR
An overview of modern augmented reality, from both application-level and technical perspective, and around 100 recent promising machine learning based works developed for AR systems, such as deep learning works for AR shopping, AR based image filters, and more.
Deep Point Cloud Reconstruction
TLDR
A deep point cloud reconstruction network consisting of a 3D sparse stacked-hourglass network as for the initial densification and denoising, and a refinement via transformers converting the discrete voxels into 3D points called amplified positional encoding is proposed.
MonoScene: Monocular 3D Semantic Scene Completion
TLDR
Experiments show the MonoScene framework outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view.
NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
TLDR
NICE-SLAM is presented, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation and optimizing this representation with pre-trained geometric priors enables detailed reconstruction on large indoor scenes.

References

SHOWING 1-10 OF 45 REFERENCES
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
TLDR
To the best of the knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time and outperforms state-of-the-art methods in terms of both ac-curacy and speed.
DPSNet: End-to-end Deep Plane Sweep Stereo
TLDR
A convolutional neural network called DPSNet (Deep Plane Sweep Network) whose design is inspired by best practices of traditional geometry-based approaches for dense depth reconstruction, achieves state-of-the-art reconstruction results on a variety of challenging datasets.
TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
TLDR
This work introduces TransformerFusion, a transformer-based 3D scene reconstruction approach that results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTMor GRU-based recurrent networks for video sequence fusion.
Point-Based Multi-View Stereo Network
TLDR
This work introduces Point-MVSNet, a novel point-based deep framework for multi-view stereo (MVS), which directly processes the target scene as point clouds and allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts.
RoutedFusion: Learning Real-Time Depth Map Fusion
TLDR
This work proposes a neural network that predicts non-linear updates to better account for typical fusion errors and outperforms the traditional fusion approach and related learned approaches on both synthetic and real data.
MVSNet: Depth Inference for Unstructured Multi-view Stereo
TLDR
This work presents an end-to-end deep learning architecture for depth map inference from multi-view images that flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature.
MVDepthNet: Real-Time Multiview Depth Estimation Neural Network
TLDR
MVDepthNet is presented, a convolutional network to solve the depth estimation problem given several image-pose pairs from a localized monocular camera in neighbor viewpoints, and it is shown that this method can generate depth maps efficiently and precisely.
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials
TLDR
This paper proposes RayNet, which combines a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion and trains RayNet end-to-end using empirical risk minimization.
Accurate 3D Reconstruction from Small Motion Clip for Rolling Shutter Cameras
TLDR
This paper proposes a pipeline for a fine-scale dense 3D reconstruction that models the rolling shutter effect by utilizing both sparse 3D points and the camera trajectory from narrow-baseline images, and shows accurate dense reconstruction results suitable for various sought-after applications.
Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth Estimation
TLDR
A geometry-aware stereo-LiDAR fusion network for long-range depth estimation, called volumetric propagation network, to exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
...
1
2
3
4
5
...