DeepSignals: Predicting Intent of Drivers Through Visual Signals

  title={DeepSignals: Predicting Intent of Drivers Through Visual Signals},
  author={Davi Frossard and Eric Kee and Raquel Urtasun},
  journal={2019 International Conference on Robotics and Automation (ICRA)},
Detecting the intention of drivers is an essential task in self-driving, necessary to anticipate sudden events like lane changes and stops. Turn signals and emergency flashers communicate such intentions, providing seconds of potentially critical reaction time. In this paper, we propose to detect these signals in video sequences by using a deep neural network that reasons about both spatial and temporal information. Our experiments on more than a million frames show high per-frame accuracy in… 

Figures and Tables from this paper

Driving Scene Understanding: How much temporal context and spatial resolution is necessary?

This work attempts to put forward some useful insights about the required spatial resolution and temporal context/depth of the visual data for Driving Scene Understanding.

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

A novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations that is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process.

Evaluating Computer Vision Techniques for Urban Mobility on Large-Scale, Unconstrained Roads

This paper uses recent computer vision techniques to identify possible irregularities on roads, the absence of street lights, and defective traffic signs using videos from a moving camera-mounted vehicle, and quantitatively measure the overall safety of roads in the city through carefully constructed metrics.

Semantics for Robotic Mapping, Perception and Interaction: A Survey

A taxonomy for semantics research in or relevant to robotics is established, split into four broad categories of activity, in which semantics are extracted, used, or both, and dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics are surveyed.

Computer Vision Based Vehicle Intension Finding by Understanding Driver Hand Signal

A Convolutional Neural Network (CNN) architecture based automatic system is introduced that can help the ego-vehicle to recognize the hand signals of a vehicle driver and take the necessary actions in advance to prevent the road accidents.



End-to-End Learning of Action Detection from Frame Glimpses in Videos

A fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions and uses REINFORCE to learn the agent's decision policy.

Social LSTM: Human Trajectory Prediction in Crowded Spaces

This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.

Learning to tell brake and turn signals in videos using CNN-LSTM structure

A method that learns to tell rear signals from a number of frames using a deep learning framework that is able to obtain more accurate predictions than using only the CNN to classify rear signals with time sequence inputs is presented.

Will this car change the lane? - Turn signal recognition in the frequency domain

This paper presents a new method to recognize turn signals of other vehicles in images using a robust vehicle detector and involves three major steps applied to each detected vehicle: light spot detection, feature extraction through FFT-based analysis of the temporal signal behavior at each detected light spot, and AdaBoost classification of the extracted feature set.

Sequential Deep Learning for Human Action Recognition

A fully automated deep model, which learns to classify human actions without using any prior knowledge is proposed, which outperforms existing deep models, and gives comparable results with the best related works.

Turn Signal Detection During Nighttime by CNN Detector and Perceptual Hashing Tracking

The proposed novel method can robustly detect and track a vehicle in front with over 95% accuracy and recognize the turning signals in night scenes with a detection rate of over 90% and improves the miss rate of state-of-the-art systems by more than 20%.

Action Recognition using Visual Attention

A soft attention based model using multi-layered Recurrent Neural Networks with Long Short-Term Memory units which are deep both spatially and temporally for action recognition in videos.

VideoLSTM convolves, attends and flows for action recognition

3D Convolutional Neural Networks for Human Action Recognition

A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.

Two-Stream Convolutional Networks for Action Recognition in Videos

This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.