• Publications
  • Influence
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
The Pyramid Vision Transformer (PVT) is introduced, which overcomes the difficulties of porting Transformer to various dense prediction tasks and could serve as an alternative and useful backbone for pixel-level predictions and facilitate future research.
Enhanced Computer Vision With Microsoft Kinect Sensor: A Review
A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.
Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement
The proposed spatiotemporal saliency detection method is robust enough to estimate the object and background in complex scenes with various motion patterns and appearances and introduces local as well as global contrast saliency measures using the foreground and background information estimated from the gradient flow field.
Learning Discriminative Representations from RGB-D Video Data
This paper introduces an adaptive learning methodology to automatically extract (holistic) spatio-temporal features, simultaneously fusing the RGB and depth information, from RGB-D video data for visual recognition tasks with significant advantages compared with state-of-the-art hand-crafted and machine-learned features.
HRank: Filter Pruning Using High-Rank Feature Map
This paper proposes a novel filter pruning method by exploring the High Rank of feature maps (HRank), inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive.
Viewpoint-Aware Attentive Multi-view Inference for Vehicle Re-identification
  • Yi Zhou, L. Shao
  • Computer Science
    IEEE/CVF Conference on Computer Vision and…
  • 1 June 2018
A Viewpoint-aware Attentive Multi-view Inference (VAMI) model that only requires visual information to solve the multi-view vehicle reID problem and achieves consistent improvements over state-of-the-art vehicle re-ID methods on two public datasets: VeRi and VehicleID.
Video Salient Object Detection via Fully Convolutional Networks
A novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables the deep video saliency network to learn diverse saliency information and prevents overfitting with the limited number of training videos.
Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images
A novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices and outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.
Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval
This paper introduces a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework, and is the first hashing work specifically designed for category-level SBIR with an end to end deep architecture.
Deep Learning for Person Re-identification: A Survey and Outlook
A comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization is conducted and a new evaluation metric (mINP) is introduced, indicating the cost for finding all the correct matches, which provides an additional criterion to evaluate the Re-IDs.