• Corpus ID: 245385791

NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

@article{Zhu2021NICESLAMNI,
  title={NICE-SLAM: Neural Implicit Scalable Encoding for SLAM},
  author={Zihan Zhu and Songyou Peng and Viktor Larsson and Weiwei Xu and Hujun Bao and Zhaopeng Cui and Martin R. Oswald and Marc Pollefeys},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.12130}
}
Neural implicit representations have recently shown en-couraging results in various domains, including promising progress in simultaneous localization and mapping (SLAM). Nevertheless, existing methods produce over-smoothed scene reconstructions and have difficulty scaling up to large scenes. These limitations are mainly due to their simple fully-connected network architecture that does not incorporate local information in the observations. In this paper, we present NICE-SLAM, a dense SLAM… 
MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
TLDR
It is demonstrated that depth and normal cues, predicted by general-purpose monocular estimators, significantly improve reconstruction quality and optimization time, and geometric monocular priors improve performance both for small-scale single-object as well as large-scale multi-object scenes, independent of the choice of representation.
iSDF: Real-Time Neural Signed Distance Fields for Robot Perception
TLDR
iSDF produces more accurate reconstructions, and better approximations of collision costs and gradients useful for downstream planners in domains from navigation to manipulation, in evaluations against alternative methods on real and synthetic datasets of indoor environments.
An Algorithm for the SE(3)-Transformation on Neural Implicit Maps for Remapping Functions
TLDR
This work presents a neural implicit map based transformation algorithm that is transformable, embedded into a SLAM framework, and able to tackle the remapping of loop closures and demonstrates high-quality surface reconstruction.
GARF: Gaussian Activated Radiance Fields for High Fidelity Reconstruction and Pose Estimation
TLDR
Gaussian Activated neural Radiance Fields (GARF) is presented as a new positional embedding-free neural radiance field architecture – employing Gaussian activations – that outperforms the current state-of-the-art in terms of high fidelity reconstruction and pose estimation.
GO-Surf: Neural Feature Grid Optimization for Fast, High-Fidelity RGB-D Surface Reconstruction
TLDR
GO-Surf is presented, a direct feature grid optimization method for accurate and fast surface reconstruction from RGB-D sequences that can optimize sequences of 1 - 2 K frames in 15 - 45 minutes, a speedup over NeuralRGB-D, the most related approach based on an MLP representation, while maintaining on par performance on standard benchmarks.
V4D: Voxel for 4D Novel View Synthesis
TLDR
The proposed LUTs-based refinement module achieves the performance gain with a little computational cost and could serve as the plug-and-play module in the novel view synthesis task.
SDF-based RGB-D Camera Tracking in Neural Scene Representations
TLDR
This work proposes to track an RGB-D camera using a signed distance distance-based representation and shows that compared to density-based representations, tracking can be sped up, which enables more robust and accurate pose estimates when computation time is limited.
Latent Partition Implicit with Surface Codes for 3D Representation
TLDR
The insight here is that both the part learning and the part blending can be conducted much easier in the latent space than in the spatial space, which means that LPI outperforms the latest methods under the widely used benchmarks in terms of reconstruction accuracy and modeling interpretability.
Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors
TLDR
The key idea is to infer signed distances by pushing both the query projections to be on the surface and the projection distance to be the minimum, which achieves state-of-the-art reconstruction accuracy, especially for sparse point clouds.
Depth Field Networks for Generalizable Multi-view Scene Representation
TLDR
This paper proposes to learn an implicit, multi-view consistent scene representation by introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity, and shows that introducing view synthesis as an auxiliary task further improves depth estimation.
...
...

References

SHOWING 1-10 OF 71 REFERENCES
DeepFactors: Real-Time Probabilistic Dense Monocular SLAM
TLDR
A SLAM system that unifies these methods in a probabilistic framework while still maintaining real-time performance through the use of a learned compact depth map representation and reformulating three different types of errors: photometric, reprojection and geometric.
CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM
TLDR
A new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented.
Convolutional Occupancy Networks
TLDR
Convolutional Occupancy Networks is proposed, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes that enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
iMAP: Implicit Mapping and Positioning in Real-Time
We show for the first time that a multilayer perceptron (MLP) can serve as the only scene representation in a real-time SLAM system for a handheld RGB-D camera. Our network is trained in live
DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras
TLDR
This work introduces DROID-SLAM, a new deep learning based SLAM system that is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures.
ElasticFusion: Dense SLAM Without A Pose Graph
TLDR
This system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps.
BAD SLAM: Bundle Adjusted Direct RGB-D SLAM
TLDR
A novel, fast direct BA formulation is presented which is implemented in a real-time dense RGB-D SLAM algorithm, and the proposed algorithm outperforms all other evaluated SLAM methods.
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo
TLDR
The experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry methods in terms of camera tracking and 3D reconstruction performance and proposes a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model built incrementally from dense depth predictions.
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
TLDR
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations
TLDR
This work introduces a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image and jointly optimising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation.
...
...