End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera

  title={End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera},
  author={Zhenbo Song and Jianfeng Lu and Tong Zhang and Hongdong Li},
  journal={2020 IEEE International Conference on Robotics and Automation (ICRA)},
Inter-vehicle distance and relative velocity estimations are two basic functions for any ADAS (Advanced driver-assistance systems). In this paper, we propose a monocular camera based inter-vehicle distance and relative velocity estimation method based on end-to-end training of a deep neural network. The key novelty of our method is the integration of multiple visual clues provided by any two time-consecutive monocular frames, which include deep feature clue, scene geometry clue, as well as… 

Figures and Tables from this paper

Multi-Stream Attention Learning for Monocular Vehicle Velocity and Inter-Vehicle Distance Estimation

A novel multi-stream attention network (MSANet) is proposed to extract different aspects of features, e.g., spatial and contextual features, for joint vehicle velocity and inter-vehicle distance estimation.

Monocular Depth and Velocity Estimation Based on Multi-Cue Fusion

A multi-cue fusion monocular velocity and ranging framework is proposed to improve the accuracy of monocular ranging and velocity measurement and uses the attention mechanism to fuse different feature information.

Enabling Object Detection and Distance Calculation in AI based Autonomous Driving System

  • K. NSudhir Shenai
  • Computer Science
    2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon)
  • 2022
Object recognition and distance estimation between two dynamic vehicles are carried out successfully in this project and YOLO is one of the best performing models for object discovery in a highly dynamic environment.

Self-Supervised Object Distance Estimation Using a Monocular Camera

A network-based on ShuffleNet and YOLO is used to detect an object, and a self-supervised learning network is used with multi-scale resolution to improve estimation accuracy by enriching the expression ability of depth information.

R4D: Utilizing Reference Objects for Long-Range Distance Estimation

This paper proposes R4D, the first framework to accurately estimate the distance of long-range objects by using references with known distances in the scene, and introduces a challenging and underexplored task, as well as two datasets to validate new methods developed for this task.

Motion Estimation Using Region-Level Segmentation and Extended Kalman Filter for Autonomous Driving

This paper presents a novel approach to estimate the motion state by using region-level instance segmentation and extended Kalman filter (EKF), and demonstrates that this method presents excellent performance and outperforms the other state-of-the-art methods either in object segmentsation and parameter estimate.

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From Monocular RGB Image

This paper proposes to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into object- level depth and the canonical NOCS representation and solves the 6D object pose problem by aligning the predicted canonical representation with the back-projected object-levels depth.

A Review of Vision-Based Traffic Semantic Understanding in ITSs

All kinds of traffic monitoring analysis methods are classed from the two perspectives of macro traffic flow and micro road behavior and the existing traffic monitoring challenges and corresponding solutions are analyzed.



Camera-based vehicle velocity estimation from monocular video

It is found that light-weight trajectory based features outperform depth and motion cues extracted from deep ConvNets, especially for far-distance predictions where current disparity and optical flow estimators are challenged significantly.

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

This paper proposes a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions and is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale- Consistent camera trajectories over a long video sequence.

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

Traffic Flow Analysis with Multiple Adaptive Vehicle Detectors and Velocity Estimation with Landmark-Based Scanlines

  • M. TranTung Dinh Duy M. Do
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2018
A heuristic to check the fitness of a particular vehicle detector to a specific region in camera's view by the mean velocity direction and the mean object size is proposed, which is expected to detect vehicles with high accuracy, both in precision and recall, even with tiny objects.

Are we ready for autonomous driving? The KITTI vision benchmark suite

The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.

Triangulation Learning Network: From Monocular to Stereo 3D Object Detection

This paper proposes to employ 3D anchors to explicitly construct object-level correspondences between the regions of interest in stereo images, from which the deep neural network learns to detect and triangulate the targeted object in 3D space.

Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

This paper proposes to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal, and achieves impressive improvements over the existing state-of-the-art in image- based performance.

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

The concept of end-to-end learning of optical flow is advanced and it work really well, and faster variants that allow optical flow computation at up to 140fps with accuracy matching the original FlowNet are presented.

Object scene flow for autonomous vehicles

A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.

Piecewise Rigid Scene Flow

A novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments is introduced that achieves leading performance levels, exceeding competing3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.