• Publications
  • Influence
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
The Pyramid Vision Transformer (PVT) is introduced, which overcomes the difficulties of porting Transformer to various dense prediction tasks and could serve as an alternative and useful backbone for pixel-level predictions and facilitate future research. Expand
Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content
This work proposes a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN), which can generate photo-realistic images with much better perceptual quality and richer fine-details. Expand
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation, where the filters and their receptive fields can be automatically learned from image-based depth maps. Expand
TransTrack: Multiple-Object Tracking with Transformer
This work proposes TransTrack, a baseline for MOT with Transformer and introduces a set of learned object queries into the pipeline to enable detecting new-coming objects, and demonstrates a much simple and effective method based on query-key mechanism that could achieve competitive 65.8% MOTA on the MOT17 challenge dataset. Expand
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images by allowing the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. Expand
Online Knowledge Distillation via Collaborative Learning
This work carefully design multiple methods to generate soft target as supervisions by effectively ensembling predictions of students and distorting the input images to consistently improve the generalization ability of deep neural networks (DNNs) that have different learning capacities. Expand
CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
A coarse-to-fine retrieval-based deep learning framework, which includes three steps, i.e., image-based coarse retrieval, pose-based fine retrieval and precise relative pose regression, which outperforms the state-of-the-art methods by a large margin on both indoor and outdoor datasets. Expand
PVTv2: Improved Baselines with Pyramid Vision Transformer
This work presents new baselines by improving the original Pyramid Vision Transformer (abbreviated as PVTv1) by adding three designs, including (1) overlapping patch embedding, (2) convolutional feedforward networks, and (3) linear complexity attention layers. Expand
DetCo: Unsupervised Contrastive Learning for Object Detection
Extensive experiments demonstrate that DetCo not only outperforms recent methods on a series of 2D and 3D instance-level detection tasks, but also competitive on image classification. Expand
Segmenting Transparent Object in the Wild with Transformer
This work presents a new fine-grained transparent object segmentation dataset, termed Trans10Kv2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset. UnlikeExpand