The Cityscapes Dataset for Semantic Urban Scene Understanding

@article{Cordts2016TheCD,
  title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
  author={Marius Cordts and Mohamed Omran and Sebastian Ramos and Timo Rehfeld and Markus Enzweiler and Rodrigo Benenson and Uwe Franke and Stefan Roth and Bernt Schiele},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={3213-3223}
}
  • Marius Cordts, Mohamed Omran, B. Schiele
  • Published 6 April 2016
  • Computer Science, Environmental Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level… 
The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes
TLDR
The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.
SkyScapes ­ Fine-Grained Semantic Understanding of Aerial Scenes
TLDR
A novel multi-task model is proposed, which incorporates semantic edge detection and is better tuned for feature extraction from a wide range of scales, which achieves notable improvements over the baselines in region outlines and level of detail on both tasks.
Semantic Understanding of Scenes Through the ADE20K Dataset
TLDR
This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
Spatially-Aware Domain Adaptation for Semantic Segmentation of Urban Scenes
TLDR
This work proposes a spatial-aware discriminator that accounts for the spatial prior on the objects in order to improve the feature alignment and demonstrates in experiments that the model outperforms several state-of-the-art baselines in terms of mean intersection over union (mIoU).
Efficient Annotation of Semantic Segmentation Datasets for Scene Understanding with Application to Autonomous Driving
TLDR
This project investigates techniques that try to reduce the work involved in annotating large datasets for semantic segmentation, and proposes a novel active learning framework that is proposed and tested on the Cityscapes benchmark dataset.
Small Object Augmentation of Urban Scenes for Real-Time Semantic Segmentation
TLDR
This paper proposes a real-time segmentation model coined Narrow Deep Network (NDNet) and builds a synthetic dataset by inserting additional small objects into the training images and achieves 65.7% mean intersection over union (mIoU) on the Cityscapes test set.
Unlimited Road-scene Synthetic Annotation (URSA) Dataset
TLDR
This work provides a method for persistent, ground truth, asset annotation of a game world, using open-source tools and resources found in single-player modding communities, and demonstrates realtime, on-demand, groundtruth data annotation capability of this method.
Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
TLDR
The Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks, reaching the performance of 67.8% PQ, 42.6% AP, and 85.2% mIOU on the test set.
The EuroCity Persons Dataset: A Novel Benchmark for Object Detection
TLDR
The EuroCity Persons dataset is introduced, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes, which is nearly one order of magnitude larger than person datasets used previously for benchmarking.
...
...

References

SHOWING 1-10 OF 94 REFERENCES
Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction
TLDR
This paper presents what to their knowledge is the first system that can perform dense, large-scale, outdoor semantic reconstruction of a scene in (near) real time and presents a `semantic fusion' approach that allows us to handle dynamic objects more effectively than previous approaches.
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
TLDR
This paper annotates static 3D scene elements with rough bounding primitives and develops a model which transfers this information into the image domain and reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.
D Traffic Scene Understanding from Movable Platforms
TLDR
A novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene is presented.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Learning Deep Features for Scene Recognition using Places Database
TLDR
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.
Automatic dense visual semantic mapping from street-level imagery
TLDR
A method for producing a semantic map from multi-view street-level imagery using two conditional random fields to model the semantic image segmentation of the street view imagery treating each image independently.
SUN RGB-D: A RGB-D scene understanding benchmark suite
TLDR
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
3D Traffic Scene Understanding From Movable Platforms
TLDR
A novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene is presented.
Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation
TLDR
An approach to holistic scene understanding that reasons jointly about regions, location, class and spatial extent of objects, presence of a class in the image, as well as the scene type that outperforms the state-of-the-art on the MSRC-21 benchmark, while being much faster.
Nonparametric semantic segmentation for 3D street scenes
  • Hu He, B. Upcroft
  • Computer Science
    2013 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2013
TLDR
This paper uses stereo image pairs collected from cameras mounted on a moving car to produce dense depth maps which are combined into a global 3D reconstruction using camera poses from stereo visual odometry and the resultant 3D semantic model is improved with the consideration of moving objects in the scene.
...
...