Learn More
We introduce the Imperial College London and National University of Ireland Maynooth (ICL-NUIM) dataset for the evaluation of visual odometry, 3D reconstruction and SLAM algorithms that typically use RGB-D data. We present a collection of handheld RGB-D camera sequences within synthetically generated environments. RGB-D sequences with perfect ground truth(More)
We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling 1. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy(More)
We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal(More)
Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data — performance increases in proportion to the amount of data used.(More)
Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need to extend beyond geometry and appearance — they need to contain semantics. We address this challenge by(More)
Higher frame-rates promise better tracking of rapid motion, but advanced real-time vision systems rarely exceed the standard 10– 60Hz range, arguing that the computation required would be too great. Actually, increasing frame-rate is mitigated by reduced computational cost per frame in trackers which take advantage of prediction. Additionally, when we(More)
Hanme Kim1 hanme.kim@imperial.ac.uk Ankur Handa2 ah781@cam.ac.uk Ryad Benosman3 ryad.benosman@upmc.fr Sio-Hoi Ieng3 sio-hoi.ieng@upmc.fr Andrew J. Davison1 a.davison@imperial.ac.uk 1 Department of Computing, Imperial College London, London, UK 2 Department of Engineering, University of Cambridge, Cambridge, UK 3 INSERM, U968, Paris, F-75012, France;(More)
We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical(More)
In matching tasks in computer vision, and particularly in real-time tracking from video, there are generally strong priors available on absolute and relative correspondence locations thanks to motion and scene models. While these priors are often partially used post-hoc to resolve matching consensus in algorithms like RANSAC, it was recently shown that(More)
We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be(More)