• Publications
  • Influence
Learning from massive noisy labeled data for image classification
A general framework to train CNNs with only a limited number of clean labels and millions of easily obtained noisy labels is introduced and the relationships between images, class labels and label noises are model with a probabilistic graphical model and further integrate it into an end-to-end deep learning system. Expand
CNN-RNN: A Unified Framework for Multi-label Image Classification
The proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image- label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Expand
Attention to Scale: Scale-Aware Semantic Image Segmentation
An attention mechanism that learns to softly weight the multi-scale features at each pixel location is proposed, which not only outperforms averageand max-pooling, but allows us to diagnostically visualize the importance of features at different positions and scales. Expand
Unsupervised Person Re-identification: Clustering and Fine-tuning
A progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains and demonstrates that PUL outputs discriminative features that improve the re-ID accuracy. Expand
ActBERT: Learning Global-Local Video-Text Representations
  • Linchao Zhu, Y. Yang
  • Computer Science
  • IEEE/CVF Conference on Computer Vision and…
  • 1 June 2020
This paper introduces ActBERT for self-supervised learning of joint video-text representations from unlabeled data and introduces an ENtangled Transformer block to encode three sources of information, i.e., global actions, local regional objects, and linguistic descriptions. Expand
Entangled Transformer for Image Captioning
A Transformer-based sequence modeling framework built only with attention layers and feedforward layers that enables the Transformer to exploit semantic and visual information simultaneously and achieves state-of-the-art performance on the MSCOCO image captioning dataset. Expand
Simultaneous determination of seven bisphenols in environmental water and solid samples by liquid chromatography-electrospray tandem mass spectrometry.
This method was successfully applied to real environmental sample analysis, which revealed that all of the tested BPs were present, with the exception of BPB. Expand
Content-Consistent Matching for Domain Adaptive Semantic Segmentation
This paper considers the adaptation of semantic segmentation from the synthetic source domain to the real target domain and proposes a content-consistent matching (CCM) model, which yields consistent improvements over the baselines and performs favorably against previous state-of-the-arts tasks. Expand
Salience-Guided Cascaded Suppression Network for Person Re-Identification
A novel Salience-guided Cascaded Suppression Network (SCSN) which enables the model to mine diverse salient features and integrate these features into the final representation by a cascaded manner and develops an efficient feature aggregation strategy that fully increases the network’s capacity for all potential salience features. Expand