• Corpus ID: 49563711

Semi-supervised Learning: Fusion of Self-supervised, Supervised Learning, and Multimodal Cues for Tactical Driver Behavior Detection

@article{Narayanan2018SemisupervisedLF,
  title={Semi-supervised Learning: Fusion of Self-supervised, Supervised Learning, and Multimodal Cues for Tactical Driver Behavior Detection},
  author={Athma Narayanan and Yi-Ting Chen and Srikanth Malla},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.00864}
}
In this paper, we presented a preliminary study for tactical driver behavior detection from untrimmed naturalistic driving recordings. While supervised learning based detection is a common approach, it suffers when labeled data is scarce. Manual annotation is both time-consuming and expensive. To emphasize this problem, we experimented on a 104-hour real-world naturalistic driving dataset with a set of predefined driving behaviors annotated. There are three challenges in the dataset. First… 

Figures and Tables from this paper

Data Transformation Insights in Self-supervision with Clustering Tasks

It is shown theoretically and empirically that certain set of transformations are helpful in convergence of self-supervised clustering and faster convergence rate with valid transformations for convex as well as certain family of non-convex objectives along with the proof of convergence to the original set of optima.

A Survey on Long-Tailed Visual Recognition

This survey focuses on the problems caused by long-tailed data distribution, sort out the representative long-tails visual recognition datasets and summarize some mainstream long-tail studies, and quantitatively study 20 widely-used and large-scale visual datasets proposed in the last decade.

Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks

This paper proposes a 3D-aware egocentric spatial-temporal interaction framework for automated driving applications, and introduces three novel concepts into Graph convolution networks (GCN), which encode ego-thing and ego-stuff interactions.

A Review of Vision-Based Traffic Semantic Understanding in ITSs

All kinds of traffic monitoring analysis methods are classed from the two perspectives of macro traffic flow and micro road behavior and the existing traffic monitoring challenges and corresponding solutions are analyzed.

References

SHOWING 1-9 OF 9 REFERENCES

Learning a Driving Simulator

This paper investigates variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames and learns a transition model in the embedded space using action conditioned Recurrent Neural Networks.

Unsupervised Learning of Depth and Ego-Motion from Video

Empirical evaluation demonstrates the effectiveness of the unsupervised learning framework for monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and pose estimation performs favorably compared to established SLAM systems under comparable input settings.

Focal Loss for Dense Object Detection

This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

This paper proposes an encoder-decoder type network, where the encoder part is composed of two branches of networks that simultaneously extract features from RGB and depth images and fuse depth features into the RGB feature maps as the network goes deeper.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Feature Pyramid Networks for Object Detection

This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly is given and several new streamlined architectures for both residual and non-residual Inception Networks are presented.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

ImageNet: A large-scale hierarchical image database

A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.