Learn More
We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling 1. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy(More)
We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal en-coder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal(More)
for the evaluation of visual odometry, 3D reconstruction and SLAM algorithms that typically use RGB-D data. We present a collection of handheld RGB-D camera sequences within synthetically generated environments. RGB-D sequences with perfect ground truth poses are provided as well as a ground truth surface model that enables a method of quantitatively(More)
Higher frame-rates promise better tracking of rapid motion, but advanced real-time vision systems rarely exceed the standard 10– 60Hz range, arguing that the computation required would be too great. Actually, increasing frame-rate is mitigated by reduced computational cost per frame in trackers which take advantage of prediction. Additionally , when we(More)
An event camera is a silicon retina which outputs not a sequence of video frames like a standard camera, but a stream of asynchronous spikes, each with pixel location, sign and precise timing, indicating when individual pixels record a threshold log intensity change (positive or negative). By encoding only image change, it offers the potential to transmit(More)
Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data — performance increases in proportion to the amount of data used.(More)
Rendering Engine encoders decoders back prop feed forward depth height from ground angle with gravity curvature annotation 4-D input Ground truth annotations Label fusion Training Run-time camera trajectory camera poses feed forward depth tsdf 4-D inputs from different viewpoints predictions Figure 1: Our system is trained exclusively on synthetic data(More)