• Publications
  • Influence
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.Expand
Towards Good Practices for Very Deep Two-Stream ConvNets
This report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain, and extends the Caffe toolbox into Multi-GPU implementation with high computational efficiency and low memory consumption. Expand
Temporal Segment Networks for Action Recognition in Videos
The proposed TSN framework, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme and won the video classification track at the ActivityNet challenge 2016 among 24 teams. Expand
Real-Time Action Recognition with Enhanced Motion Vector CNNs
This paper accelerates the deep two-stream architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation, and introduces three strategies for this, initialization transfer, supervision transfer and their combination. Expand
CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016
This paper uses the latest deep model architecture, e.g., ResNet and Inception V3, and introduces new aggregation schemes (top-k and attention-weighted pooling) and incorporates the audio as a complementary channel, extracting relevant information via a CNN applied to the spectrograms. Expand
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
A hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition is proposed, which achieves an excellent performance on two standard benchmarks. Expand
Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs
A two-stream-based real-time action recognition approach by using motion vector (MV) to replace OF by using deeply transferred MV CNN, which is significantly faster than OF based approaches and achieves processing speed of 390.7 frames per second, surpassing real- time requirement. Expand
Transferring Deep Object and Scene Representations for Event Recognition in Still Images
This paper empirically investigates the correlation of the concepts of object, scene, and event, and proposes an iterative selection method to identify a subset of object and scene classes deemed most relevant for representation transfer, and develops three transfer techniques. Expand
Exploring Fisher vector and deep networks for action spotting
The method and attempt on track 2 at the ChaLearn Looking at People (LAP) challenge 2015 is described, and the results obtained by the Fisher vector approach which achieves a Jaccard Index of 0.5385 and ranks the 1st place in track 2. Expand
CUHK & SIAT Submission for THUMOS 15 Action Recognition Challenge
This paper presents the method of our submission for THUMOS15 action recognition challenge. We propose a new action recognition system by exploiting very deep twostream ConvNets and Fisher vectorExpand