• Publications
  • Influence
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.Expand
Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks
TLDR
A deep cascaded multitask framework that exploits the inherent correlation between detection and alignment to boost up their performance and achieves superior accuracy over the state-of-the-art techniques on the challenging face detection dataset and benchmark. Expand
A Discriminative Feature Learning Approach for Deep Face Recognition
TLDR
This paper proposes a new supervision signal, called center loss, for face recognition task, which simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers. Expand
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
TLDR
This work thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improves each of them to derive an Enhanced SRGAN (ESRGAN), which achieves consistently better visual quality with more realistic and natural textures than SRGAN. Expand
Action recognition with trajectory-pooled deep-convolutional descriptors
TLDR
This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features and deep-learned features, and achieves superior performance to the state of the art on these datasets. Expand
Detecting Text in Natural Image with Connectionist Text Proposal Network
TLDR
A novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image and develops a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. Expand
NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results
TLDR
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results and gauges the state-of-the-art in single imagesuper-resolution. Expand
Towards Good Practices for Very Deep Two-Stream ConvNets
TLDR
This report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain, and extends the Caffe toolbox into Multi-GPU implementation with high computational efficiency and low memory consumption. Expand
FOTS: Fast Oriented Text Spotting with a Unified Network
TLDR
This work proposes a unified end-to-end trainable Fast Oriented Text Spotting (FOTS) network for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks, and introduces RoIRotate to share convolutional features between detection and Recognition. Expand
Temporal Segment Networks for Action Recognition in Videos
TLDR
The proposed TSN framework, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme and won the video classification track at the ActivityNet challenge 2016 among 24 teams. Expand
...
1
2
3
4
5
...