BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

@article{Yu2020BDD100KAD,
  title={BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning},
  author={Fisher Yu and Haofeng Chen and Xin Wang and Wenqi Xian and Yingying Chen and Fangchen Liu and Vashisht Madhavan and Trevor Darrell},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={2633-2642}
}
  • F. Yu, H. Chen, +5 authors Trevor Darrell
  • Published 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition… Expand
Diverse Complexity Measures for Dataset Curation in Self-driving
TLDR
A novel approach is introduced and a new data selection method is proposed that exploits a diverse set of criteria that quantize interestingness of traffic scenes and is able to select datasets that lead to better generalization and higher performance. Expand
IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving
TLDR
A new large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains is created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions. Expand
Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types
TLDR
This paper carries out an extensive experimental exploration of transfer learning across vastly different image domains (consumer photos, autonomous driving, aerial imagery, underwater, indoor scenes, synthetic, close-ups) and task types (semantic segmentation, object detection, depth estimation, keypoint detection). Expand
One Million Scenes for Autonomous Driving: ONCE Dataset
TLDR
The ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario is introduced and a benchmark is provided in which a variety of self-supervised and semi- supervised methods on the ONCE dataset are evaluated. Expand
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving
TLDR
It is shown that SODA10M can serve as a promising pretraining dataset for different self-supervised learning methods, which gives superior performance when finetuning autonomous driving downstream tasks, and will be used to hold the ICCV2021 SSLAD challenge. Expand
A Survey on Deep Learning Based Approaches for Scene Understanding in Autonomous Driving
TLDR
This paper aims to provide a comprehensive survey of deep learning-based approaches for scene understanding in autonomous driving, categorizing these works into four work streams, including object detection, full scene semantic segmentation, instance segmentations, and lane line segmentation. Expand
Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation
TLDR
Experimental results prove the effectiveness of the proposed UDA strategy in adapting a segmentation network trained on synthetic datasets, like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary. Expand
Multi-task Learning with Attention for End-to-end Autonomous Driving
TLDR
A novel multi-task attention-aware network in the conditional imitation learning (CIL) framework is proposed, which does not only improve the success rate of standard benchmarks, but also the ability to react to traffic lights, which is shown with standard benchmarks. Expand
One-Shot Summary Prototypical Network Toward Accurate Unpaved Road Semantic Segmentation
TLDR
The OSPNet improves previous two branch few-shot segmentation approaches by introducing the summary branch which enables channel-wise weighting for important features in the feature map of support and query branches and quantitatively and qualitatively outperforms recent supervised and few- shot segmentation models. Expand
A*3D Dataset: Towards Autonomous Driving in Challenging Environments
TLDR
A new challenging A*3D dataset which consists of RGB images and LiDAR data with a significant diversity of scene, time, and weather is introduced which addresses the gaps in the existing datasets to push the boundaries of tasks in autonomous driving research to more challenging highly diverse environments. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Learning multiple visual domains with residual adapters
TLDR
This paper develops a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains and introduces the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very differentVisual domains and measures their ability to recognize well uniformly. Expand
End-to-End Learning of Driving Models from Large-Scale Video Datasets
TLDR
This work advocates learning a generic vehicle motion model from large scale crowd-sourced video data, and develops an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Expand
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
TLDR
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories. Expand
CityPersons: A Diverse Dataset for Pedestrian Detection
TLDR
This work revisits CNN design and point out key adaptations, enabling plain FasterRCNN to obtain state-of-the-art results on the Caltech dataset, and introduces CityPersons, a new set of person annotations on top of the Cityscapes dataset, to achieve further improvement from more and better data. Expand
The 2017 DAVIS Challenge on Video Object Segmentation
TLDR
The scope of the benchmark, the main characteristics of the dataset, the evaluation metrics of the competition, and a detailed analysis of the results of the participants to the challenge are described. Expand
Taskonomy: Disentangling Task Transfer Learning
TLDR
This work proposes a fully computational approach for modeling the structure of space of visual tasks via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space and provides a computational taxonomic map for task transfer learning. Expand
The Cityscapes Dataset for Semantic Urban Scene Understanding
TLDR
This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Expand
Learning Deep Features for Scene Recognition using Places Database
TLDR
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity. Expand
Vision meets robotics: The KITTI dataset
TLDR
A novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research, using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras and a high-precision GPS/IMU inertial navigation system. Expand
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
TLDR
This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%. Expand
...
1
2
3
4
5
...