• Publications
  • Influence
Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training
CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts
A novel framework to generate and rank plausible hypotheses for the spatial extent of objects in images using bottom-up computational processes and mid-level selection cues and it is shown that the algorithm can be used, successfully, in a segmentation-based visual object category recognition pipeline.
Constrained parametric min-cuts for automatic object segmentation
It is shown that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC09 segmentation dataset and achieves the same average best segmentation covering as the best performing technique to date.
Semantic Segmentation with Second-Order Pooling
This paper introduces multiplicative second-order analogues of average and max-pooling that together with appropriate non-linearities lead to state-of-the-art performance on free-form region recognition, without any type of feature coding.
Twin Gaussian Processes for Structured Prediction
We describe twin Gaussian processes (TGP), a generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by
The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
A fast, simple, yet powerful non-parametric Moving Pose (MP) framework that enables low-latency recognition, one-shot learning, and action detection in difficult unsegmented sequences and is real-time, scalable, and outperforms more sophisticated approaches on challenging benchmarks.
Matrix Backpropagation for Deep Networks with Structured Layers
A sound mathematical apparatus to formally integrate global structured computation into deep computation architectures and demonstrates that deep networks relying on second-order pooling and normalized cuts layers, trained end-to-end using matrix backpropagation, outperform counterparts that do not take advantage of such global layers.
Efficient Match Kernel between Sets of Features for Visual Recognition
It is shown that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise.
Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition
This work complements existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 and UCF Sports with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks, and introduces novel dynamic consistency and alignment measures, which underline the remarkable stability of patterns of visual search among subjects.
Discriminative density propagation for 3D human motion estimation
The density propagation rules for discriminative inference in continuous, temporal chain models are established and flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models are proposed.