Simultaneous Object Detection and Semantic Segmentation

@article{Salscheider2020SimultaneousOD,
  title={Simultaneous Object Detection and Semantic Segmentation},
  author={Niels Ole Salscheider},
  journal={ArXiv},
  year={2020},
  volume={abs/1905.02285}
}
Both object detection in and semantic segmentation of camera images are important tasks for automated vehicles. Object detection is necessary so that the planning and behavior modules can reason about other road users. Semantic segmentation provides for example free space information and information about static and dynamic parts of the environment. There has been a lot of research to solve both tasks using Convolutional Neural Networks. These approaches give good results but are… Expand
Realtime 3D Object Detection for Automated Driving Using Stereo Vision and Semantic Information
TLDR
This work proposes a 3D object detection and pose estimation method for automated driving using stereo images that focuses not only on cars, but on all types of road users and can ensure real-time capability through GPU implementation of the entire processing chain. Expand
Monocular Localization in HD Maps by Combining Semantic Segmentation and Distance Transform
TLDR
This work proposes using semantic segmentation on a monocular camera to localize directly in a HD map as used for automated driving, which combines lightweight, yet powerful HD maps with the simplicity of monocular vision and the flexibility of neural networks. Expand
Learning-Based Shape Estimation with Grid Map Patches for Realtime 3D Object Detection for Automated Driving
TLDR
This paper proposes an approach that projects the 3D points of image-based bounding box proposals into so-called grid map patches that make this approach the fastest stereo-based 3D object detector on the KITTI benchmark while still achieving results that are within the range of the best image- based algorithms. Expand
A Versatile Machine Vision Algorithm for Real-Time Counting Manually Assembled Pieces
TLDR
This work presents a Machine Vision algorithm which is able to effectively deal with human interactions inside a framed area and requires no training and is therefore extremely flexible, requiring only minor changes to the working parameters to translate to other objects, making it appropriate for plant-wide implementation. Expand
Vision-based Lifting of 2D Object Detections for Automated Driving
TLDR
This paper proposes a pipeline which lifts the results of existing vision-based 2D algorithms to 3D detections using only cameras as a cost-effective alternative to LiDAR, and is the first using a 2D CNN to process the point cloud for each 2D detection to keep the computational effort as low as possible. Expand
Image semantic segmentation with an improved fully convolutional network
TLDR
The experimental results show that the three improved methods proposed in this paper can make the model obtain more expressive features and improve the accuracy of the algorithm. Expand
Refining Semantic Segmentation with Superpixels using Transparent Initialization and Sparse Encoder
Although deep learning greatly improves the performance of semantic segmentation, its success mainly lies in object central areas without accurate edges. As superpixels are a popular and effectiveExpand
Semantic Evidential Grid Mapping based on Stereo Vision
TLDR
An improved method to estimate a semantic evidential multi-layer grid map using depth from stereo vision paired with pixel-wise semantically annotated images and incorporating a disparity-based ground surface estimation in the inverse perspective mapping is presented. Expand
Deep Learning Superpixel Semantic Segmentation with Transparent Initialization and Sparse Encoder
TLDR
This paper jointly learn semantic segmentation with trainable superpixels by adding fully-connected layers with transparent initialization and an efficient logit uniformization with a sparse encoder to reduce the large computational complexity arising from indexing pixels bysuperpixels. Expand
Multi-modal semantic image segmentation
TLDR
This work proposes a new and efficient network architecture which outperforms the traditional Mask R-CNN method through better exploiting the output features of CNNs and extends the proposed multi-modal semantic segmentation method to two additional modalities; heatmap and IR images. Expand
...
1
2
...

References

SHOWING 1-10 OF 29 REFERENCES
Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving
TLDR
A joint multi-task network design for learning object detection and semantic segmentation simultaneously by sharing of encoder for both the tasks by constructing an efficient architecture using a small ResNet10 like encoder which is shared for both decoders. Expand
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Expand
Segmentation as selective search for object recognition
TLDR
This work adapt segmentation as a selective search by reconsidering segmentation to generate many approximate locations over few and precise object delineations because an object whose location is never generated can not be recognised and appearance and immediate nearby context are most effective for object recognition. Expand
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork. Expand
Deep Semantic Lane Segmentation for Mapless Driving
TLDR
A novel pipeline using a deep neural network to detect lane semantics and topology given RGB images is presented, showing accurate ego lane detection including lane semantics on challenging scenarios for autonomous driving. Expand
Are we ready for autonomous driving? The KITTI vision benchmark suite
TLDR
The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Expand
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component. Expand
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. Expand
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features. Expand
The Pascal Visual Object Classes Challenge: A Retrospective
TLDR
A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges. Expand
...
1
2
3
...