VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

@article{Radwan2018VLocNetDM,
  title={VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry},
  author={Noha Radwan and Abhinav Valada and Wolfram Burgard},
  journal={IEEE Robotics and Automation Letters},
  year={2018},
  volume={3},
  pages={4407-4414}
}
Semantic understanding and localization are fundamental enablers of robot autonomy that have been tackled as disjoint problems for the most part. While deep learning has enabled recent breakthroughs across a wide spectrum of scene understanding tasks, its applicability to state estimation tasks has been limited due to the direct formulation that renders it incapable of encoding scene-specific constrains. In this letter, we propose the VLocNet++ architecture that employs a multitask learning… 
Incorporating Semantic and Geometric Priors in Deep Pose Regression
TLDR
This work proposes the VLocNet++ architecture that overcomes this limitation by simultaneously embedding geometric and semantic knowledge of the world into the pose regression network, and proposes the novel Geometric Consistency Loss function that leverages the predicted relative motion estimated from odometry to constrict the search space while training.
Deep auxiliary learning for visual localization using colorization task
TLDR
This work proposes a novel auxiliary learning strategy for camera localization by introducing scenespecific high-level semantics from self-supervised representation learning task and shows that this model significantly improve localization accuracy over state-of-the-arts on both indoor and outdoor datasets.
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
TLDR
A mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner is proposed.
Learning Reliable and Scalable Representations Using Multimodal Multitask Deep Learning
TLDR
This work enables models to effectively learn fused representations from multiple modalities and across tasks, exploiting complementary features and cross-modal interdependencies and facilitates self-supervised model adaptation.
Hierarchical Joint Scene Coordinate Classification and Regression for Visual Localization
TLDR
This work presents a new hierarchical joint classification-regression network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image and achieves the state-of-the-art single-image RGB localization performance on three benchmark datasets.
DA4AD: End-to-end Deep Attention Aware Features Aided Visual Localization for Autonomous Driving
TLDR
This work seeks to exploit the deep attention mechanism to search for salient, distinctive and stable features that are good for longterm matching in the scene through a novel end-to-end deep neural network, leading to a potential low-cost localization solution for autonomous driving.
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence
TLDR
This work provides a comprehensive survey, and proposes a new taxonomy for localization and mapping using deep learning, and revisits the problem of perceiving self-motion and scene understanding with on-board sensors, and shows how to solve it by integrating these modules into a prospective spatial machine intelligence system (SMIS).
Semantically-Aware Attentive Neural Embeddings for 2D Long-Term Visual Localization
TLDR
This work proposes a novel end-toend deep attention-based framework that utilizes multimodal cues to generate robust embeddings for 2D-VL and predicts a shared channel attention and modality-specific spatial attentions to guide theembeddings to focus on more reliable image regions.
MVLoc: Multimodal Variational Geometry-Aware Learning for Visual Localization
TLDR
This work proposes an end-to-end framework to fuse different sensor inputs through a variational Product-of-Experts (PoE) joint encoder followed by attention-based fusion, and shows how accuracy can be increased through importance sampling and reparameterization of the latent space.
SIVO: Semantically Informed Visual Odometry and Mapping
TLDR
SIVO (Semantically Informed Visual Odometry and Mapping), a novel feature selection method for visual SLAM which incorporates machine learning and neural network uncertainty into an information-theoretic approach to feature selection, is presented.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Deep Auxiliary Learning for Visual Localization and Odometry
TLDR
This work proposes VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images, and proposes a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates.
DeepVO: A Deep Learning approach for Monocular Visual Odometry
TLDR
A Convolutional Neural Network architecture is proposed, best suited for estimating the object's pose under known environment conditions, and displays promising results when it comes to inferring the actual scale using just a single camera in real-time.
AdapNet: Adaptive semantic segmentation in adverse environmental conditions
TLDR
This paper proposes a novel semantic segmentation architecture and the convoluted mixture of deep experts (CMoDE) fusion technique that enables a multi-stream deep neural network to learn features from complementary modalities and spectra, each of which are specialized in a subset of the input space.
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
TLDR
A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings.
Deep regression for monocular camera-based 6-DoF global localization in outdoor environments
  • Tayyab Naseer, W. Burgard
  • Computer Science
    2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2017
TLDR
This paper proposes an approach for directly regressing a 6-DoF camera pose using CNNs and a single monocular RGB image and shows the localization accuracy of this approach on publicly available datasets and that it outperforms CNN-based state-of-the-art methods.
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
TLDR
This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems.
MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
TLDR
This paper presents an approach to joint classification, detection and semantic segmentation using a unified architecture where the encoder is shared amongst the three tasks, and performs extremely well in the challenging KITTI dataset.
Modelling uncertainty in deep learning for camera relocalization
  • Alex Kendall, R. Cipolla
  • Computer Science
    2016 IEEE International Conference on Robotics and Automation (ICRA)
  • 2016
TLDR
A Bayesian convolutional neural network is used to regress the 6-DOF camera pose from a single RGB image and an estimate of the model's relocalization uncertainty is obtained to improve state of the art localization accuracy on a large scale outdoor dataset.
Deep Learning for Laser Based Odometry Estimation
TLDR
This paper takes advantage of recent advances in deep learning techniques focused on image classification to estimate transforms between consecutive point clouds by utilizing convolution neural networks for reducing the state space of the laser scan.
Delving deeper into convolutional neural networks for camera relocalization
TLDR
A variant of Euler angles named Euler6 is proposed to represent orientation and a data augmentation method named pose synthesis is designed to reduce spsarsity of poses in the whole pose space to cope with overfitting in training as well as a multi-task CNN named BranchNet to deal with the complex coupling of orientation and translation.
...
1
2
3
4
...