• Publications
  • Influence
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
FlowNet: Learning Optical Flow with Convolutional Networks
This paper constructs CNNs which are capable of solving the optical flow estimation problem as a supervised learning task, and proposes and compares two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.
A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation
  • N. Mayer, Eddy Ilg, +4 authors T. Brox
  • Computer Science, Mathematics
    IEEE Conference on Computer Vision and Pattern…
  • 7 December 2015
This paper proposes three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks and presents a convolutional network for real-time disparity estimation that provides state-of-the-art results.
CARLA: An Open Urban Driving Simulator
This work introduces CARLA, an open-source simulator for autonomous driving research, and uses it to study the performance of three approaches to autonomous driving: a classic modular pipeline, an end-to-end model trained via imitation learning, and an end to-end models trained via reinforcement learning.
Striving for Simplicity: The All Convolutional Net
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
The concept of end-to-end learning of optical flow is advanced and it work really well, and faster variants that allow optical flow computation at up to 140fps with accuracy matching the original FlowNet are presented.
On Evaluation of Embodied Navigation Agents
The present document summarizes the consensus recommendations of a working group to study empirical methodology in navigation research and discusses different problem statements and the role of generalization, present evaluation measures, and provides standard scenarios that can be used for benchmarking.
End-to-End Driving Via Conditional Imitation Learning
This work evaluates different architectures for conditional imitation learning in vision-based driving and conducts experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area.
Generating Images with Perceptual Similarity Metrics based on Deep Networks
A class of loss functions, which are called deep perceptual similarity metrics (DeePSiM), are proposed that compute distances between image features extracted by deep neural networks and better reflects perceptually similarity of images and thus leads to better results.
MLP-Mixer: An all-MLP Architecture for Vision
It is shown that while convolutions and attention are both sufficient for good performance, neither of them are necessary, and MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs), attains competitive scores on image classification benchmarks.