NeuralFusion: Online Depth Fusion in Latent Space

@article{Weder2021NeuralFusionOD,
  title={NeuralFusion: Online Depth Fusion in Latent Space},
  author={Silvan Weder and Johannes L. Sch{\"o}nberger and Marc Pollefeys and Martin R. Oswald},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={3161-3171}
}
We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, via an additional translator network. Our neural network architecture consists of two main parts: a… 
DeepSurfels: Learning Online Appearance Fusion
TLDR
An end-to-end trainable online appearance fusion pipeline that fuses information from RGB images into the proposed scene representation and is trained using self-supervision imposed by the reprojection error with respect to the input images is presented.
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
TLDR
To the best of the knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time and outperforms state-of-the-art methods in terms of both ac-curacy and speed.
BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
TLDR
This work proposes a novel bi-level fusion strategy that considers both efficiency and reconstruction quality by design, and evaluates the proposed method on multiple datasets quantitatively and qualitatively, demonstrating a significant improvement over existing methods.
Learning Online Multi-Sensor Depth Fusion
TLDR
SenFuNet is introduced, a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion and outperforms traditional and recent online depth fusion approaches.
HRBF-Fusion: Accurate 3D Reconstruction from RGB-D Data Using On-the-Fly Implicits
Reconstruction of high-fidelity 3D objects or scenes is a fundamental research problem. Recent advances in RGB-D fusion have demonstrated the potential of producing 3D models from consumer-level
Multi-sensor large-scale dataset for multi-view 3D reconstruction
We present a new multi-sensor dataset for 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense,
VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single and Multi-view 3D Reconstruction
TLDR
This work introduces a unified single and multi-view neural implicit 3D reconstruction framework VPFusion, and proposes a novel interleaved 3D reasoning and pairwise view association architecture for feature volume fusion across different views.
CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-scale Indoor Scene
TLDR
CIRCLE is a framework for large-scale scene completion and geometric refinement based on an end-to-end sparse convolutional network, CircNet, that jointly models local geometric details and global scene structural contexts, allowing it to preserve fine-grained object detail while recovering missing regions commonly arising in traditional 3D scene data.
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
TLDR
It is shown for the first time that a single network can represent scene geometry over time continually without catastrophic forgetting, while achieving promising trade-offs between accuracy and efficiency.
Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction
TLDR
The proposed Gradient-SDF represents a novel representation for 3D geometry that combines the advantages of implict and explicit representations and is equally suited for (GPU) parallelization as related approaches.
...
1
2
...

References

SHOWING 1-10 OF 78 REFERENCES
OctNetFusion: Learning Depth Fusion from Data
TLDR
This paper presents a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps and significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression.
DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks
TLDR
This work presents a system for dense keyframe-based camera tracking and depth map estimation that is entirely learned, and shows that generating a large number of pose hypotheses leads to more accurate predictions.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
TLDR
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations
TLDR
This work introduces a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image and jointly optimising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation.
CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM
TLDR
A new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented.
SemanticFusion: Dense 3D semantic mapping with convolutional neural networks
TLDR
This work combines Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories, and produces a useful semantic 3D map.
Deep Volumetric Video From Very Sparse Multi-view Performance Capture
TLDR
This work focuses on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results.
Occupancy Networks: Learning 3D Reconstruction in Function Space
TLDR
This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input.
DeepVoxels: Learning Persistent 3D Feature Embeddings
TLDR
This work proposes DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry, based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying3D scene structure.
ElasticFusion: Real-time dense SLAM and light source estimation
TLDR
It is shown that a novel approach to real-time dense visual simultaneous localisation and mapping enables more realistic augmented reality rendering; a richer understanding of the scene beyond pure geometry and more accurate and robust photometric tracking.
...
1
2
3
4
5
...