• Publications
  • Influence
VirtualWorlds as Proxy for Multi-object Tracking Analysis
TLDR
We propose an efficient real-to-virtual world cloning method, and validate our approach by building a new video dataset, called "Virtual KITTI", automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. Expand
  • 554
  • 107
  • PDF
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
TLDR
We propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. Expand
  • 145
  • 49
  • PDF
Exploring the Limitations of Behavior Cloning for Autonomous Driving
TLDR
We show that behavior cloning leads to state-of-the-art results, executing complex lateral and longitudinal maneuvers, even in unseen environments. Expand
  • 78
  • 23
  • PDF
3D Packing for Self-Supervised Monocular Depth Estimation
TLDR
In this work, we propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos. Expand
  • 73
  • 18
  • PDF
ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape
TLDR
We present a deep learning method for end-to-end monocular 3D object detection and metric shape retrieval, optimizing directly a novel 3D loss formulation. Expand
  • 95
  • 12
  • PDF
Learning to Fuse Things and Stuff
TLDR
We propose an end-to-end learning approach for panoptic segmentation, a novel task unifying instance (things) and semantic (stuff) segmentation. Expand
  • 53
  • 12
  • PDF
Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
TLDR
This paper introduces a novel architecture for self-supervised monocular depth estimation that leverages semantic information from a fixed pretrained network to guide the generation of multi-level depth features via pixel-adaptive convolutions. Expand
  • 39
  • 11
  • PDF
Activity representation with motion hierarchies
TLDR
We learn hierarchical representations of activity videos in an unsupervised manner. Expand
  • 101
  • 10
Actom sequence models for efficient action detection
TLDR
We address the problem of detecting actions, such as drinking or opening a door, in hours of challenging video data. Expand
  • 165
  • 8
  • PDF
Procedural Generation of Videos to Train Deep Action Recognition Networks
TLDR
We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. Expand
  • 84
  • 6
  • PDF
...
1
2
3
4
5
...