Playing for Benchmarks

@article{Richter2017PlayingFB,
  title={Playing for Benchmarks},
  author={Stephan R. Richter and Zeeshan Hayder and Vladlen Koltun},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={2232-2241}
}
We present a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in… 

Figures and Tables from this paper

Unlimited Road-scene Synthetic Annotation (URSA) Dataset
TLDR
This work provides a method for persistent, ground truth, asset annotation of a game world, using open-source tools and resources found in single-player modding communities, and demonstrates realtime, on-demand, groundtruth data annotation capability of this method.
NOVA: Rendering Virtual Worlds with Humans
TLDR
NOVA, a versatile framework to create realistic-looking 3D rendered worlds containing procedurally generated humans with rich pixel-level ground truth annotations, is presented, indicating that the synthetic data generated by NOVA represents a good proxy of the real-world and can be exploited for computer vision tasks.
NOVA: Rendering Virtual Worlds with Humans for Computer Vision Tasks
TLDR
NOVA, a versatile framework to create realistic-looking 3D rendered worlds containing procedurally generated humans with rich pixel-level ground truth annotations, is presented, indicating that the synthetic data generated by NOVA represents a good proxy of the real-world and can be exploited for computer vision tasks.
Playing for Depth
TLDR
This work presents a new depth dataset captured from Video Games in an easy and reproducible way and shows that using synthetic datasets increases the accuracy of Monocular Depth Estimation in the wild where other approaches usually fail to generalize.
A tool for semi-automatic ground truth annotation of traffic videos
TLDR
A semi-automatic annotation tool – CVL Annotator – for bounding box ground truth generation in videos and performs a preliminary user study to measure the amount of time and clicks necessary to produce ground truth annotations of video traffic scenes and evaluate the accuracy of the final annotation results.
Visual Object Detection using Convolutional Neural Networks in a Virtual Environment
TLDR
This thesis uses a simulator, to generate a synthetic data set of 16 different types of vehicles captured from an airplane, and trains and evaluates two state-of-the-art detectors on the generated data set, and investigates different fusion techniques between detectors which were trained on two different subsets of the dataset.
STEP: Segmenting and Tracking Every Pixel
TLDR
This work presents a new benchmark: Segmenting and Tracking Every Pixel (STEP), encompassing two datasets, KITTI-STEP, and MOTChallenge-STEP together with a new evaluation metric Segmentation and Tracking Quality (STQ), that fairly balances semantic and tracking aspects of this task and is suitable for evaluating sequences of arbitrary length.
Silver: Novel Rendering Engine for Data Hungry Computer Vision Models
TLDR
In this work, a new rendering engine called Silver is presented in detail and can be used to provide clean, unbiased, and large-scale training and testing data for various computer vision tasks.
The ApolloScape Dataset for Autonomous Driving
TLDR
A large-scale open dataset that consists of RGB videos and corresponding dense 3D point clouds that can deeply benefit various autonomous driving related applications that include but not limited to 2D/3D scene understanding, localization, transfer learning, and driving simulation is presented.
Visual Perception with Synthetic Data
TLDR
A method that reconstructs the surface of objects from a single view in uncalibrated illumination conditions is developed, and a method to speed up the annotation dramatically is developed by recognizing shared resources and automatically propagating annotations across the dataset.
...
...

References

SHOWING 1-10 OF 69 REFERENCES
VirtualWorlds as Proxy for Multi-object Tracking Analysis
TLDR
This work proposes an efficient real-to-virtual world cloning method, and validate the approach by building and publicly releasing a new video dataset, called "Virtual KITTI", automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow.
A Database and Evaluation Methodology for Optical Flow
TLDR
This paper proposes a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms and analyzes the results obtained to date to draw a large number of conclusions.
The HCI Benchmark Suite: Stereo and Flow Ground Truth with Uncertainties for Urban Autonomous Driving
TLDR
A new stereo and optical flow dataset is presented to complement existing benchmarks, specifically designed to be representative for urban autonomous driving, including realistic, systematically varied radiometric and geometric challenges which were previously unavailable.
A Naturalistic Open Source Movie for Optical Flow Evaluation
TLDR
A new optical flow data set derived from the open source 3D animated short film Sintel is introduced, which has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects.
Understanding RealWorld Indoor Scenes with Synthetic Data
TLDR
This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes.
The Cityscapes Dataset for Semantic Urban Scene Understanding
TLDR
This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
SUN Database: Exploring a Large Collection of Scene Categories
TLDR
The Scene Understanding database is proposed, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse that contains 908 distinct scene categories and 131,072 images.
Object scene flow for autonomous vehicles
TLDR
A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.
The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes
TLDR
This paper generates a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations, and conducts experiments with DCNNs that show how the inclusion of SYnTHIA in the training stage significantly improves performance on the semantic segmentation task.
...
...