Are we ready for autonomous driving? The KITTI vision benchmark suite

@article{Geiger2012AreWR,
  title={Are we ready for autonomous driving? The KITTI vision benchmark suite},
  author={Andreas Geiger and Philip Lenz and Raquel Urtasun},
  journal={2012 IEEE Conference on Computer Vision and Pattern Recognition},
  year={2012},
  pages={3354-3361}
}
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of… Expand
Vision meets robotics: The KITTI dataset
TLDR
A novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research, using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras and a high-precision GPS/IMU inertial navigation system. Expand
High Accuracy Monocular SFM and Scale Correction for Autonomous Driving
TLDR
A novel data-driven mechanism for cue combination that allows highly accurate ground plane estimation by adapting observation covariances of multiple cues, such as sparse feature matching and dense inter-frame stereo, based on their relative confidences inferred from visual data on a per-frame basis is presented. Expand
Efficient Distributed Training of Vehicle Vision Systems
Self-driving vehicle vision systems must deal with an extremely broad and challenging set of scenes. We propose a distributed training regimen for a CNN vision system whereby vehicles in the fieldExpand
Self-Supervised Visual Odometry with Ego-Motion Sampling
In recent years, deep learning-based methods for monocular visual odometry have made good progress and now demonstrate state-of-the-art results on the well-known KITTI benchmark. However, collectingExpand
Training my car to see using virtual worlds
TLDR
This paper summarizes a research line consisting of training visual models using photo-realistic computer graphics, especially focusing on assisted and autonomous driving, and shows how it has become a new tendency with increasing acceptance. Expand
Deep Monocular Visual Odometry for Ground Vehicle
TLDR
The proposed motion focusing and decoupling approach can improve the visual odometry performance by reducing the relative pose error and the dimension reduction of the learning objective means the network is much lighter with only four convolution layers, which can quickly converge during the training stage and run in real-time during the testing stage. Expand
DeepVO: A Deep Learning approach for Monocular Visual Odometry
TLDR
A Convolutional Neural Network architecture is proposed, best suited for estimating the object's pose under known environment conditions, and displays promising results when it comes to inferring the actual scale using just a single camera in real-time. Expand
Visual Object Recognition with 3D-Aware Features in KITTI Urban Scenes
TLDR
This work proposes 3D-aware features computed from stereo color images in order to capture the appearance and depth peculiarities of the objects in road scenes, and is the first work that reports results with stereo data for the KITTI object challenge, achieving increased detection ratios for the classes car and cyclist. Expand
A Benchmark for Visual-Inertial Odometry Systems Employing Onboard Illumination
TLDR
A dataset for evaluating the performance of visual-inertial odometry (VIO) systems employing an onboard light source, and analysis of several start-of-the-art VO and VIO frame-works. Expand
Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art
TLDR
This survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 52 REFERENCES
StereoScan: Dense 3d reconstruction in real-time
TLDR
This paper proposes a novel approach to build 3d maps from high-resolution stereo sequences in real-time from a sparse feature matcher in conjunction with an efficient and robust visual odometry algorithm, and shows that the proposed odometry method achieves state-of-the-art accuracy. Expand
A collection of outdoor robotic datasets with centimeter-accuracy ground truth
TLDR
This work addressed both the practical and theoretical issues found while building a collection of six outdoor datasets, discussing how to estimate the 6-d vehicle path from readings of a set of three Real Time Kinematics (RTK) GPS receivers, as well as the associated uncertainty bounds that can be employed to evaluate the performance of SLAM methods. Expand
A new benchmark for stereo-based pedestrian detection
TLDR
The paper furthermore quantifies the benefit of stereo vision for ROI generation and localization; at equal detection rates, false positives are reduced by a factor of 4–5 with stereo over mono, using the same HOG/linSVM classification component. Expand
Visual odometry priors for robust EKF-SLAM
TLDR
This work performs fast pose estimation using the two-stage RANSAC-based approach from [1], andorporating the visual odometry prior in the EKF process yields better and more robust localization and mapping results when compared to the constant linear and angular velocity model case. Expand
Fast cost-volume filtering for visual correspondence and beyond
TLDR
This paper proposes a generic and simple framework comprising three steps: constructing a cost volume, fast cost volume filtering and winner-take-all label selection, and achieves state-of-the-art results that achieve disparity maps in real-time, and optical flow fields with very fine structures as well as large displacements. Expand
Pedestrian Detection: An Evaluation of the State of the Art
TLDR
An extensive evaluation of the state of the art in a unified framework of monocular pedestrian detection using sixteen pretrained state-of-the-art detectors across six data sets and proposes a refined per-frame evaluation methodology. Expand
Automatic camera and range sensor calibration using a single shot
TLDR
It is demonstrated that the proposed checkerboard corner detector significantly outperforms current state-of-the-art and the proposed camera-to-range registration method is able to discover multiple solutions in the case of ambiguities. Expand
Flow separation for fast and robust stereo odometry
TLDR
Separating sparse flow provides fast and robust stereo visual odometry that deals with nearly degenerate situations that often arise in practical applications and avoids the problem of nearly degenerated data under which RANSAC is known to return inconsistent results. Expand
Towards a benchmark for RGB-D SLAM evaluation
TLDR
A large dataset containing RGB-D image sequences and the ground-truth camera trajectories is provided and an evaluation criterion for measuring the quality of the estimated camera trajectory of visual SLAM systems is proposed. Expand
Depth Estimation Using Monocular and Stereo Cues
TLDR
This paper shows that by adding monocular cues to stereo (triangulation) ones, it is shown that significantly more accurate depth estimates are obtained than is possible using either monocular or stereo cues alone. Expand
...
1
2
3
4
5
...