• Publications
  • Influence
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
TLDR
This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Expand
  • 1,933
  • 586
  • PDF
The Kinetics Human Action Video Dataset
We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is takenExpand
  • 919
  • 181
  • PDF
CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts
TLDR
We present a novel framework to generate and rank plausible hypotheses for the spatial extent of objects in images using bottom-up computational processes and mid-level selection cues. Expand
  • 589
  • 72
  • PDF
Constrained parametric min-cuts for automatic object segmentation
TLDR
We present a novel framework for generating and ranking plausible objects hypotheses in an image using bottom-up processes and mid-level cues, by solving a sequence of constrained parametric min-cut problems (CPMC). Expand
  • 471
  • 51
  • PDF
Semantic Segmentation with Second-Order Pooling
TLDR
We introduce multiplicative second-order analogues of average and max-pooling that together with appropriate nonlinearities lead to state-of-the-art performance on free-form region recognition, without any type of feature coding. Expand
  • 424
  • 48
  • PDF
Human Pose Estimation with Iterative Error Feedback
TLDR
Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing. Expand
  • 430
  • 36
  • PDF
Learning to See by Moving
TLDR
The current dominant paradigm for feature learning in computer vision relies on training neural networks for the task of object recognition using millions of hand labelled images. Expand
  • 398
  • 30
  • PDF
Category-specific object reconstruction from a single image
TLDR
This paper introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes, that can be driven by noisy automatic object segmentations. Expand
  • 220
  • 16
  • PDF
Object Recognition by Sequential Figure-Ground Ranking
TLDR
We present an approach to visual object-class segmentation and recognition based on a pipeline that combines multiple figure-ground hypotheses with large object spatial support, generated by bottom-up computational processes that do not exploit knowledge of specific categories, and sequential categorization based on continuous estimates of the spatial overlap between the image segment hypotheses and each putative class. Expand
  • 120
  • 16
  • PDF
Pedestrian detection combining RGB and dense LIDAR data
TLDR
We train a state-of-the-art deformable parts detector using different configurations of optical images and their associated 3D point clouds, leveraging upon the recently released KITTI dataset. Expand
  • 133
  • 14
  • PDF