• Publications
  • Influence
Learning Spatiotemporal Features with 3D Convolutional Networks
The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. Expand
Interactive Facial Feature Localization
An improvement to the Active Shape Model is proposed that allows for greater independence among the facial components and improves on the appearance fitting step by introducing a Viterbi optimization process that operates along the facial contours. Expand
Poselets: Body part detectors trained using 3D human pose annotations
A new dataset, H3D, is built of annotations of humans in 2D photographs with 3D joint information, inferred using anthropometric constraints, to address the classic problems of detection, segmentation and pose estimation of people in images with a novel definition of a part, a poselet. Expand
Semantic contours from inverse detectors
A simple yet effective method for combining generic object detectors with bottom-up contours to identify object contours is presented and a principled way of combining information from different part detectors and across categories is provided. Expand
PANDA: Pose Aligned Networks for Deep Attribute Modeling
A new method which combines part-based models and deep learning by training pose-normalized CNNs for inferring human attributes from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion is proposed. Expand
C3D: Generic Features for Video Analysis
Convolution 3D feature is proposed, a generic spatio-temporal feature obtained by training a deep 3-dimensional convolutional network on a large annotated video dataset comprising objects, scenes, actions, and other frequently occurring concepts that encapsulate appearance and motion cues and perform well on different video classification tasks. Expand
Real-Time Adaptive Image Compression
A machine learning-based approach to lossy image compression which outperforms all existing codecs, while running in real-time, and supplementing the approach with adversarial training specialized towards use in a compression setting. Expand
Robust object detection via soft cascade
We describe a method for training object detectors using a generalization of the cascade architecture, which results in a detection rate and speed comparable to that of the best published detectorsExpand
Beyond frontal faces: Improving Person Recognition using multiple cues
The Pose Invariant PErson Recognition (PIPER) method is proposed, which accumulates the cues of poselet-level person recognizers trained by deep convolutional networks to discount for the pose variations, combined with a face recognizer and a global recognizer. Expand
Compressing Deep Convolutional Networks using Vector Quantization
This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Expand