NeuralFusion: Online Depth Fusion in Latent Space
@article{Weder2021NeuralFusionOD, title={NeuralFusion: Online Depth Fusion in Latent Space}, author={Silvan Weder and Johannes L. Sch{\"o}nberger and Marc Pollefeys and Martin R. Oswald}, journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2021}, pages={3161-3171} }
We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, via an additional translator network. Our neural network architecture consists of two main parts: a…
Figures and Tables from this paper
14 Citations
DeepSurfels: Learning Online Appearance Fusion
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
An end-to-end trainable online appearance fusion pipeline that fuses information from RGB images into the proposed scene representation and is trained using self-supervision imposed by the reprojection error with respect to the input images is presented.
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
To the best of the knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time and outperforms state-of-the-art methods in terms of both ac-curacy and speed.
BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
- Computer ScienceArXiv
- 2022
This work proposes a novel bi-level fusion strategy that considers both efficiency and reconstruction quality by design, and evaluates the proposed method on multiple datasets quantitatively and qualitatively, demonstrating a significant improvement over existing methods.
Learning Online Multi-Sensor Depth Fusion
- Computer ScienceArXiv
- 2022
SenFuNet is introduced, a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion and outperforms traditional and recent online depth fusion approaches.
HRBF-Fusion: Accurate 3D Reconstruction from RGB-D Data Using On-the-Fly Implicits
- PhysicsACM Transactions on Graphics
- 2022
Reconstruction of high-fidelity 3D objects or scenes is a fundamental research problem. Recent advances in RGB-D fusion have demonstrated the potential of producing 3D models from consumer-level…
Multi-sensor large-scale dataset for multi-view 3D reconstruction
- Computer ScienceArXiv
- 2022
We present a new multi-sensor dataset for 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense,…
VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single and Multi-view 3D Reconstruction
- Computer ScienceArXiv
- 2022
This work introduces a unified single and multi-view neural implicit 3D reconstruction framework VPFusion, and proposes a novel interleaved 3D reasoning and pairwise view association architecture for feature volume fusion across different views.
CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-scale Indoor Scene
- Computer ScienceArXiv
- 2021
CIRCLE is a framework for large-scale scene completion and geometric refinement based on an end-to-end sparse convolutional network, CircNet, that jointly models local geometric details and global scene structural contexts, allowing it to preserve fine-grained object detail while recovering missing regions commonly arising in traditional 3D scene data.
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
It is shown for the first time that a single network can represent scene geometry over time continually without catastrophic forgetting, while achieving promising trade-offs between accuracy and efficiency.
Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction
- Computer Science
- 2021
The proposed Gradient-SDF represents a novel representation for 3D geometry that combines the advantages of implict and explicit representations and is equally suited for (GPU) parallelization as related approaches.
References
SHOWING 1-10 OF 78 REFERENCES
OctNetFusion: Learning Depth Fusion from Data
- Computer Science2017 International Conference on 3D Vision (3DV)
- 2017
This paper presents a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps and significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression.
DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks
- Computer ScienceInternational Journal of Computer Vision
- 2019
This work presents a system for dense keyframe-based camera tracking and depth map estimation that is entirely learned, and shows that generating a large number of pose hypotheses leads to more accurate predictions.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
- Computer ScienceNeurIPS
- 2019
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work introduces a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image and jointly optimising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation.
CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented.
SemanticFusion: Dense 3D semantic mapping with convolutional neural networks
- Computer Science2017 IEEE International Conference on Robotics and Automation (ICRA)
- 2017
This work combines Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories, and produces a useful semantic 3D map.
Deep Volumetric Video From Very Sparse Multi-view Performance Capture
- Computer ScienceECCV
- 2018
This work focuses on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results.
Occupancy Networks: Learning 3D Reconstruction in Function Space
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input.
DeepVoxels: Learning Persistent 3D Feature Embeddings
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work proposes DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry, based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying3D scene structure.
ElasticFusion: Real-time dense SLAM and light source estimation
- Computer ScienceInt. J. Robotics Res.
- 2016
It is shown that a novel approach to real-time dense visual simultaneous localisation and mapping enables more realistic augmented reality rendering; a richer understanding of the scene beyond pure geometry and more accurate and robust photometric tracking.