• Publications
  • Influence
Are we ready for autonomous driving? The KITTI vision benchmark suite
The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Vision meets robotics: The KITTI dataset
A novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research, using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras and a high-precision GPS/IMU inertial navigation system.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
To align movies and books, a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book are proposed.
The Role of Context for Object Detection and Semantic Segmentation in the Wild
A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales.
Learning to Reweight Examples for Robust Deep Learning
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
Efficient Large-Scale Stereo Matching
A novel approach to binocular stereo for fast matching of high-resolution images by building a prior on the disparities by forming a triangulation on a set of support points which can be robustly matched, reducing the matching ambiguities of the remaining points.
Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts
This work proposes a novel approach to handle large deformations and partial occlusions in animals in terms of body parts, and applies it to the six animal categories in the PASCAL VOC dataset and shows that it significantly improves state-of-the-art (by 4.1% AP) and provides a richer representation for objects.
Monocular 3D Object Detection for Autonomous Driving
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
Order-Embeddings of Images and Language
A general method for learning ordered representations is introduced, and it is shown that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.