Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations

@article{Nekrasov2019RealTimeJS,
  title={Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations},
  author={Vladimir Nekrasov and Thanuja Dharmasiri and Andrew Spek and Tom Drummond and Chunhua Shen and Ian D. Reid},
  journal={2019 International Conference on Robotics and Automation (ICRA)},
  year={2019},
  pages={7101-7107}
}
Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. [] Key Method To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful ‘teacher’ network. We…

Figures and Tables from this paper

Real-Time Semantic Segmentation With Fast Attention
TLDR
The proposed architecture relies on the fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.
Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems
TLDR
A novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments, aiming to applications for resource-constrained platforms with a monocular camera being the primary perception module.
Hard Pixel Mining for Depth Privileged Semantic Segmentation
TLDR
This paper proposes a novel Loss Weight Module, which outputs a loss weight map by employing two depth-related measurements of hard pixels: Depth Prediction Error and Depth-aware Segmentation Error, and is applied to segmentation loss, with the goal of learning a more robust model by paying more attention to the hard pixels.
Real-time Monocular Depth Estimation with Extremely Light-Weight Neural Network
TLDR
A supervised learning-based CNN with detachable decoders that produce depth predictions with different scales is proposed, and a novel log-depth loss function is formulated that computes the difference of predicted depth map and ground truth depth map in log space, so as to increase the prediction accuracy for nearby locations.
A Semi-Supervised Approach to Monocular Depth Estimation, Depth Refinement, and Semantic Segmentation of Driving Scenes using a Siamese Triple Decoder Architecture
TLDR
A unified learning framework for generating a refined depth estimation map and semantic segmentation map given a single image and results indicate that the model can effectively utilize both geometric and semantic information.
Learning Geometry and Semantics for Deep Image Restoration / Caner Hazırbaş ; Gutachter: Ian Reid, Daniel Cremers ; Betreuer: Daniel Cremers
TLDR
This thesis presents a fusion-based CNN architecture to incorporate depth into semantic segmentation and proposes a multimodal CNN architecture that exploits pixelwise semantic labels in addition to color information and thus improves image restoration tasks.
SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches
TLDR
A novel network architecture Semantic-Feature-Aided Monocular Depth Estimation Network (SFA-MDEN) to extract multi-resolution depth features and semantic features, which are merged and fed into the decoder, with the goal of predicting depth with the support of semantics.
Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation
TLDR
This paper proposes a novel attentionbased dual supervised decoder for RGBD semantic segmentation with superior performance against the state-of-the-art methods and introduces a dual-branch decoder to effectively leverage the correlations and complementary cues of different tasks.
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
TLDR
This work focuses on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks, and relies on a progressive strategy that terminates non-promising architectures from being further trained, and on Polyak averaging coupled with knowledge distillation to speed-up the convergence.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Light-Weight RefineNet for Real-Time Semantic Segmentation
TLDR
This work adapts a powerful semantic segmentation architecture, called RefineNet, into the more compact one, suitable even for tasks requiring real-time performance on high-resolution inputs, and proposes two modifications aimed to decrease the number of parameters and floating point operations.
Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks
TLDR
This work presents a new model for simultaneous depth estimation and semantic segmentation from a single RGB image and couple the deep CNN with fully connected CRF, which captures the contextual relationships and interactions between the semantic and depth cues improving the accuracy of the final results.
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
TLDR
An image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address the challenging task of real-time semantic segmentation is proposed and in-depth analysis of the framework is provided.
BlitzNet: A Real-Time Deep Network for Scene Understanding
TLDR
A deep architecture is proposed, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations and shows state-of-the-art performance forobject detection and segmentation among real time systems.
SemanticFusion: Dense 3D semantic mapping with convolutional neural networks
TLDR
This work combines Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories, and produces a useful semantic 3D map.
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation
TLDR
RefineNet is presented, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections and introduces chained residual pooling, which captures rich background context in an efficient manner.
Deeper Depth Prediction with Fully Convolutional Residual Networks
TLDR
A fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps is proposed and a novel way to efficiently learn feature map up-sampling within the network is presented.
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
TLDR
A novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation, which is up to 18 times faster, requires 75% less FLOPs, has 79% less parameters, and provides similar or better accuracy to existing models.
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
TLDR
A novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation and is an end-to-end trainable framework, allowing joint learning of all sub-models.
Rethinking Atrous Convolution for Semantic Image Segmentation
TLDR
The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
...
...