• Corpus ID: 60440409

NeurAll: Towards a Unified Model for Visual Perception in Automated Driving

  title={NeurAll: Towards a Unified Model for Visual Perception in Automated Driving},
  author={Ganesh Sistu and Isabelle Leang and Sumanth Chennupati and Stefan Milz and Senthil Kumar Yogamani and Samir A. Rawashdeh},
Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks… 

Figures and Tables from this paper

MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning

This work proposes a multi-stream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion in order to better handle the difference in convergence rates of different tasks.

A Comparative Study of Different CNN Encoders for Monocular Depth Prediction

A Convolutional Neural Network is demonstrated in an encoder-decoder architecture to perform monocular depth prediction and different CNN encoders’ performance are evaluated and compared.

Integration IMVIP 2019 : Irish Machine Vision and Image Processing 2019 FisheyeMultiNet : Real-time Multitask Learning Architecture for Surround-view Automated Parking System

A holistic overview of an industrial system covering the embedded system, use cases and the deep learning architecture is provided, and a real-time multi-task deep learning network called FisheyeMultiNet is demonstrated, which detects all the necessary objects for parking on a low-power embedded system.

RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving

This work proposes a real-time end-to-end CNN architecture for MOD utilizing spatio-temporal context to improve robustness and constructs a novel time-aware architecture exploiting temporal motion information embedded within sequential images in addition to explicit motion maps using optical flow images.

Overview and Empirical Analysis of ISP Parameter Tuning for Visual Perception in Autonomous Driving

This paper is partly review and partly positional with demonstration of several preliminary results promising for future research on the impact of image quality in camera perception for tasks such as recognition, localization and reconstruction.

FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving

This work proposes a CNN architecture for moving object detection using fisheye images that were captured in autonomous driving environment and designs a lightweight encoder sharing weights across sequential images to target embedded deployment.

WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving

The first extensive fisheye automotive dataset, WoodScape, named after Robert Wood, which comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection is released.

On the Road With 16 Neurons: Towards Interpretable and Manipulable Latent Representations for Visual Predictions in Driving Scenarios

A strategy for visual perception in the context of autonomous driving is proposed that uses compact representations that use as few as 16 neural units for each of the two basic driving concepts the authors consider: cars and lanes.

DeepTrailerAssist: Deep Learning Based Trailer Detection, Tracking and Articulation Angle Estimation on Automotive Rear-View Camera

  • Ashok DahalJ. Hossen D. Troy
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
  • 2019
This work presents all the trailer assist use cases in detail and proposes a deep learning based solution for trailer perception problems using a proprietary dataset comprising of 11 different trailer types to achieve a reasonable detection accuracy.

On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks

This paper identifies within the deep learning framework two artificial counterparts of the aforementioned neurocognitive theories and finds a correspondence between the first theoretical idea and the architecture of convolutional autoencoders.



Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning

How deep learning can be used to replace parts of the classical Visual SLAM pipeline and the opportunities of using Deep Learning to improve upon state-of-the-art classical methods are discussed.

Auxiliary Tasks in Multi-task Learning

The proposed deep multi-task CNN architecture was trained on various combination of tasks using synMT, and the experiments confirmed that auxiliary tasks can indeed boost network performance, both in terms of final results and training time.

Learning the Frame-2-Frame Ego-Motion for Visual Odometry with Convolutional Neural Network

A CNN model is constructed which formulates the pose regression as a supervised learning problem and can achieve better ego-motion estimation compared to the baselines.

DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks

Extensive experiments on the KITTI VO dataset show competitive performance to state-of-the-art methods, verifying that the end-to-end Deep Learning technique can be a viable complement to the traditional VO systems.

UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory

  • Iasonas Kokkinos
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act like

Two-Stream Convolutional Networks for Action Recognition in Videos

This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

Learning temporal features with CNNs for monocular visual ego motion estimation

This work proposes two CNN architectures which are able to learn features for the extraction of this temporal information andare able to solve problems like ego motion estimation.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

Fast Scene Understanding for Autonomous Driving

This paper presents a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation.

SMSnet: Semantic motion segmentation using deep convolutional neural networks

A novel convolutional neural network architecture that learns to predict both the object label and motion status of each pixel in an image, which outperforms existing approaches achieving state-of-the-art performance on the KITTI dataset, as well as in the more challenging Cityscapes-Motion dataset while being substantially faster than existing techniques.