Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning

@article{Ramanishka2018TowardDS,
  title={Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning},
  author={Vasili Ramanishka and Yi-Ting Chen and Teruhisa Misu and Kate Saenko},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={7699-7707}
}
Driving Scene understanding is a key ingredient for intelligent transportation systems. [] Key Method A novel annotation methodology is introduced to enable research on driver behavior understanding from untrimmed data sequences. As the first step, baseline algorithms for driver behavior detection are trained and tested to demonstrate the feasibility of the proposed task.

Figures and Tables from this paper

Toward Reasoning of Driving Behavior
  • Teruhisa Misu, Yi-Ting Chen
  • Computer Science
    2018 21st International Conference on Intelligent Transportation Systems (ITSC)
  • 2018
TLDR
The Honda Research Institute Driving Dataset (HDD) is presented, a challenging dataset to enable research on learning driver behavior and causal reasoning in real-life environments and an annotation scheme to describe complex driving behaviors is introduced.
DBUS: Human Driving Behavior Understanding System
TLDR
DBUS is a real-time driving behavior understanding system which works with front-view videos, GPS/IMU signals collected from daily driving scenarios, and mimicking the human intelligence for driving, powered with representation capability of deep neural networks as well as recent advances in visual perception, video temporal segmentation, attention mechanism.
A system of vision sensor based deep neural networks for complex driving scene analysis in support of crash risk assessment and prevention
TLDR
To address the scarcity of annotated image datasets for studying traffic crashes, two completely new datasets have been developed and made available to the public and proved to be effective in training the proposed deep neural networks.
ROAD: The ROad event Awareness Dataset for Autonomous Driving
TLDR
The ROad event Awareness Dataset for Autonomous Driving is introduced, to the authors' knowledge the first of its kind, designed to test an autonomous vehicles ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations.
Driver Intention Anticipation Based on In-Cabin and Driving Scene Monitoring
TLDR
This work proposes a Convolutional-LSTM (ConvL STM)-based auto-encoder to extract motion features from the out-cabin traffic, trains a classifier which considers motions from both in and outside of the cabin jointly for maneuver intention anticipation, and experimentally proves that the in- and outside image features have complementary information.
Grounding Human-To-Vehicle Advice for Self-Driving Vehicles
TLDR
It is shown that taking advice improves the performance of the end-to-end network, while the network cues on a variety of visual features that are provided by advice are provided.
Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles
TLDR
This work introduces the novel domain-specific Drive&Act benchmark for fine-grained categorization of driver behavior, and provides challenging benchmarks by adopting prominent methods for video- and body pose-based action recognition.
Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions
TLDR
This work proposes a novel data-driven approach that uses scene-graphs as intermediate representations for modeling the subjective risk of driving maneuvers and demonstrates that this approach can learn effectively even from smaller datasets.
Visuomotor Understanding for Representation Learning of Driving Scenes
TLDR
This work leverages the large-scale unlabeled yet naturally paired data for visual representation learning in the driving scenario and demonstrates that the learned representation can benefit other tasks that require detailed scene understanding and outperforms competing unsupervised representations on semantic segmentation.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
TLDR
This paper proposes to map an input image to a small number of key perception indicators that directly relate to the affordance of a road/traffic state for driving and argues that the direct perception representation provides the right level of abstraction.
Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior
TLDR
A novel dataset is introduced which in addition to providing the bounding box information for pedestrian detection, also includes the behavioral and contextual annotations for the scenes, which allows combining visual and semantic information for better understanding of pedestrians' intentions in various traffic scenarios.
End-to-End Learning of Driving Models from Large-Scale Video Datasets
TLDR
This work advocates learning a generic vehicle motion model from large scale crowd-sourced video data, and develops an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state.
Learning a Driving Simulator
TLDR
This paper investigates variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames and learns a transition model in the embedded space using action conditioned Recurrent Neural Networks.
Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models
TLDR
This work proposes an Autoregressive Input-Output HMM to model the contextual information alongwith the maneuvers in driving maneuvers and shows that it can anticipate maneuvers 3.5 seconds before they occur with over 80% F1-score in real-time.
Are we ready for autonomous driving? The KITTI vision benchmark suite
TLDR
The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
The Cityscapes Dataset for Semantic Urban Scene Understanding
TLDR
This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.
Learning Image Representations Tied to Egomotion from Unlabeled Video
TLDR
This work proposes a new “embodied” visual learning paradigm, exploiting proprioceptive motor signals to train visual representations from egocentric video with no manual supervision, and shows that this unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks.
The BDD-Nexar Collective : A Large-Scale , Crowsourced , Dataset of Driving Scenes Vashisht Madhavan
TLDR
The BDD-Nexar dataset is a large-scale collection of urban driving scenes, comprised of high-quality video sequences taken from multiple vehicles, across three major cities in the United States: San Francisco, New York, and Los Angeles, that is verified as a challenging and extensive benchmark for computer vision research for autonomous driving.
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
TLDR
A novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of 3D ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data.
...
1
2
3
4
...