• Publications
  • Influence
Learning Spatiotemporal Features with 3D Convolutional Networks
TLDR
The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. Expand
A Closer Look at Spatiotemporal Convolutions for Action Recognition
TLDR
A new spatiotemporal convolutional block "R(2+1)D" is designed which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51. Expand
Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors
TLDR
A reconstruction method using a Probabilistic Principal Components Analysis shape model and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points is proposed. Expand
Efficient Object Category Recognition Using Classemes
TLDR
A new descriptor for images is introduced which allows the construction of efficient and compact classifiers with good accuracy on object category recognition, and allows object-category queries to be made against image databases using efficient classifiers such as linear support vector machines. Expand
C3D: Generic Features for Video Analysis
TLDR
Convolution 3D feature is proposed, a generic spatio-temporal feature obtained by training a deep 3-dimensional convolutional network on a large annotated video dataset comprising objects, scenes, actions, and other frequently occurring concepts that encapsulate appearance and motion cues and perform well on different video classification tasks. Expand
Feature Correspondence Via Graph Matching: Models and Global Optimization
TLDR
A novel graph matching optimization technique, which is referred to as dual decomposition (DD), is described, and it is demonstrated on a variety of examples that this method outperforms existing graph matching algorithms. Expand
Video Classification With Channel-Separated Convolutional Networks
TLDR
It is empirically demonstrated that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks, and this leads to an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate. Expand
Is Space-Time Attention All You Need for Video Understanding?
TLDR
This paper presents a convolution-free approach to video classification built exclusively on self-attention over space and time, and suggests that “divided attention,” where temporal attention and spatial attention are separately applied within each block, leads to the best video classification accuracy among the design choices considered. Expand
Tracking and modeling non-rigid objects with rank constraints
TLDR
A novel solution for flow-based tracking and 3D reconstruction of deforming objects in monocular image sequences using a linear combination of 3D basis shapes and the rank constraint is used to achieve robust and precise low-level optical flow estimation. Expand
Large Margin Component Analysis
TLDR
This paper proposes a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin, and reduces the risks of overfitting. Expand
...
1
2
3
4
5
...