NeuralFusion: Online Depth Fusion in Latent Space

@article{Weder2021NeuralFusionOD,
  title={NeuralFusion: Online Depth Fusion in Latent Space},
  author={Silvan Weder and Johannes L. Sch{\"o}nberger and Marc Pollefeys and Martin R. Oswald},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={3161-3171}
}
We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, via an additional translator network. Our neural network architecture consists of two main parts: a… 
DeepSurfels: Learning Online Appearance Fusion
TLDR
An end-to-end trainable online appearance fusion pipeline that fuses information from RGB images into the proposed scene representation and is trained using self-supervision imposed by the reprojection error with respect to the input images is presented.
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
TLDR
To the best of the knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time and outperforms state-of-the-art methods in terms of both ac-curacy and speed.
Learning Online Multi-Sensor Depth Fusion
TLDR
SenFuNet is introduced, a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion and outperforms traditional and recent online depth fusion approaches.
BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
TLDR
This work proposes a novel bi-level fusion strategy that considers both efficiency and reconstruction quality by design, and evaluates the proposed method on multiple datasets quantitatively and qualitatively, demonstrating a significant improvement over existing methods.
VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single and Multi-view 3D Reconstruction
TLDR
This work introduces a unified single and multi-view neural implicit 3D reconstruction framework VPFusion, and proposes a novel interleaved 3D reasoning and pairwise view association architecture for feature volume fusion across different views.
Multi-sensor large-scale dataset for multi-view 3D reconstruction
We present a new multi-sensor dataset for 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense,
HRBF-Fusion: Accurate 3D Reconstruction from RGB-D Data Using On-the-fly Implicits
Reconstruction of high-fidelity 3D objects or scenes is a fundamental research problem. Recent advances in RGB-D fusion have demonstrated the potential of producing 3D models from consumer-level
NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
TLDR
NICE-SLAM is presented, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation and optimizing this representation with pre-trained geometric priors enables detailed reconstruction on large indoor scenes.
Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction
TLDR
The proposed Gradient-SDF represents a novel representation for 3D geometry that combines the advantages of implict and explicit representations and is equally suited for (GPU) parallelization as related approaches.
CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-scale Indoor Scene
TLDR
CIRCLE is a framework for large-scale scene completion and geometric refinement based on an end-to-end sparse convolutional network, CircNet, that jointly models local geometric details and global scene structural contexts, allowing it to preserve fine-grained object detail while recovering missing regions commonly arising in traditional 3D scene data.
...
...

References

SHOWING 1-10 OF 78 REFERENCES
RoutedFusion: Learning Real-Time Depth Map Fusion
TLDR
This work proposes a neural network that predicts non-linear updates to better account for typical fusion errors and outperforms the traditional fusion approach and related learned approaches on both synthetic and real data.
OctNetFusion: Learning Depth Fusion from Data
TLDR
This paper presents a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps and significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression.
DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks
TLDR
This work presents a system for dense keyframe-based camera tracking and depth map estimation that is entirely learned, and shows that generating a large number of pose hypotheses leads to more accurate predictions.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
TLDR
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations
TLDR
This work introduces a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image and jointly optimising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation.
Convolutional Occupancy Networks
TLDR
Convolutional Occupancy Networks is proposed, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes that enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM
TLDR
A new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented.
SemanticFusion: Dense 3D semantic mapping with convolutional neural networks
TLDR
This work combines Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories, and produces a useful semantic 3D map.
Deep Volumetric Video From Very Sparse Multi-view Performance Capture
TLDR
This work focuses on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results.
Atlas: End-to-End 3D Scene Reconstruction from Posed Images
TLDR
An end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images is presented and semantic segmentation of the 3D model is obtained without significant computation.
...
...