MAAD: A Model and Dataset for "Attended Awareness" in Driving

  title={MAAD: A Model and Dataset for "Attended Awareness" in Driving},
  author={Deepak Edakkattil Gopinath and Guy Rosman and Simon Stent and Katsuya Terahata and Luke Fletcher and Brenna Argall and John F. Leonard},
  journal={2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
We propose a computational model to estimate a person’s attended awareness of their environment. We define "attended awareness" to be those parts of a potentially dynamic scene which a person has attended to in recent history and which they are still likely to be physically aware of. Our model takes as input scene information in the form of a video and noisy gaze estimates, and outputs visual saliency, a refined gaze estimate and an estimate of the person’s attended awareness. In order to test… 


Predicting the Driver's Focus of Attention: The DR(eye)VE Project
This work proposes a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics, and introduces DR(eye)VE, the largest dataset of driving scenes for which eye-tracking annotations are available.
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
This work presents Gaze360, a large-scale remote gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images, and proposes a simple self-supervised approach to improve cross-dataset domain adaptation.
Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition
This work complements existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 and UCF Sports with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks, and introduces novel dynamic consistency and alignment measures, which underline the remarkable stability of patterns of visual search among subjects.
Delving into egocentric actions
A novel set of egocentric features are presented and shown how they can be combined with motion and object features and a significant performance boost over all previous state-of-the-art methods is uncovered.
Probabilistic learning of task-specific visual attention
This work proposes a unified Bayesian approach for modeling task-driven visual attention, and shows that it is able to predict human attention and gaze better than the state-of-the-art, with a large margin.
STAViS: Spatio-Temporal AudioVisual Saliency Network
Evaluation results across databases indicate that the STAViS model outperforms the authors' visual only variant as well as the other state-of-the-art models in the majority of cases, and indicates that it is appropriate for estimating saliency "in thewild".
A Deeper Look at Saliency: Feature Contrast, Semantics, and Beyond
A deep learning model based on fully convolutional networks (FCNs) is presented, which shows very favorable performance across a wide variety of benchmarks relative to existing proposals, and demonstrates that the manner in which training data is selected, and ground truth treated is critical to resulting model behaviour.
Recurrent Mixture Density Network for Spatiotemporal Visual Attention
A spatiotemporal attentional model that learns where to look in a video directly from human fixation data, and is optimized via maximum likelihood estimation using human fixations as training data, without knowledge of the action in each video.
Eye guidance in natural vision: reinterpreting salience.
It is argued that there is a need to move away from this class of model and find the principles that govern gaze allocation in a broader range of settings, because the stimulus context is limited, and the dynamic, task-driven nature of vision is not represented.
Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model
This paper presents a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms, and shows, through an extensive evaluation, that the proposed architecture outperforms the current state-of-the-art on public saliency prediction datasets.