• Publications
  • Influence
Deep High-Resolution Representation Learning for Visual Recognition
The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems. Expand
Multiple Instance Detection Network with Online Instance Classifier Refinement
This work formulate weakly supervised object detection as a Multiple Instance Learning (MIL) problem, where instance classifiers (object detectors) are put into the network as hidden nodes and instance labels inferred from weak supervision are propagated to their spatially overlapped instances to refine instance classifier online. Expand
CCNet: Criss-Cross Attention for Semantic Segmentation
This work proposes a Criss-Cross Network (CCNet) for obtaining contextual information in a more effective and efficient way and achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results. Expand
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image. Expand
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving noExpand
Mask Scoring R-CNN
This paper proposes Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks and calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. Expand
High-Resolution Representations for Labeling Pixels and Regions
A simple modification is introduced to augment the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions rather than only the representation from thehigh-resolution convolution, which leads to stronger representations, evidenced by superior results. Expand
Robust Scene Text Recognition with Automatic Rectification
RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text, which is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. Expand
Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing
This paper proposes to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing, and obtains the state-of-the-art performance. Expand
PCL: Proposal Cluster Learning for Weakly Supervised Object Detection
This paper first shows that instances can be assigned object or background labels directly based on proposal clusters for instance classifier refinement, and then shows that treating each cluster as a small new bag yields fewer ambiguities than the directly assigning label method. Expand