Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Wenhai Wang, Enze Xie, L. Shao
- Computer ScienceIEEE International Conference on Computer Vision
- 24 February 2021
The Pyramid Vision Transformer (PVT) is introduced, which overcomes the difficulties of porting Transformer to various dense prediction tasks and is validated through extensive experiments, showing that it boosts the performance of many downstream tasks, including object detection, instance and semantic segmentation.
PVTv2: Improved Baselines with Pyramid Vision Transformer
- Wenhai Wang, Enze Xie, Ling Shao
- Computer ScienceComputational Visual Media
- 25 June 2021
This work improves the original Pyramid Vision Transformer (PVT v1) by adding three designs: a linear complexity attention layer, an overlapping patch embedding, and a convolutional feed-forward network to reduce the computational complexity of PVT v1 to linearity and provide significant improvements on fundamental vision tasks.
Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content
- Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, W. Zuo, P. Luo
- Computer ScienceComputer Vision and Pattern Recognition
- 1 June 2020
This work proposes a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN), which can generate photo-realistic images with much better perceptual quality and richer fine-details.
TransTrack: Multiple-Object Tracking with Transformer
This work proposes TransTrack, a baseline for MOT with Transformer and introduces a set of learned object queries into the pipeline to enable detecting new-coming objects, and demonstrates a much simple and effective method based on query-key mechanism that could achieve competitive 65.8% MOTA on the MOT17 challenge dataset.
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
- Mingyu Ding, Yuqi Huo, P. Luo
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 10 December 2019
D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation, where the filters and their receptive fields can be automatically learned from image-based depth maps.
Parser-Free Virtual Try-on via Distilling Appearance Flows
- Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, P. Luo
- Computer ScienceComputer Vision and Pattern Recognition
- 8 March 2021
This work proposes a novel approach, "teacher-tutor-student" knowledge distillation, which is able to produce highly photo-realistic images without human parsing, possessing several appealing advantages compared to prior arts.
Online Knowledge Distillation via Collaborative Learning
- Qiushan Guo, Xinjiang Wang, P. Luo
- Computer ScienceComputer Vision and Pattern Recognition
- 1 June 2020
This work carefully design multiple methods to generate soft target as supervisions by effectively ensembling predictions of students and distorting the input images to consistently improve the generalization ability of deep neural networks (DNNs) that have different learning capacities.
DetCo: Unsupervised Contrastive Learning for Object Detection
- Enze Xie, Jian Ding, P. Luo
- Computer ScienceIEEE International Conference on Computer Vision
- 9 February 2021
Extensive experiments demonstrate that DetCo not only outperforms recent methods on a series of 2D and 3D instance-level detection tasks, but also competitive on image classification.
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
- Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, P. Luo
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 30 March 2020
This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images and allows the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN.
Segmenting Transparent Object in the Wild with Transformer
- Enze Xie, Wenjia Wang, P. Luo
- Computer Science
- 2021
A novel Transformer-based segmentation pipeline termed Trans2Seg is proposed, which significantly outperforms all the CNN-based methods, showing the proposed algorithm’s potential ability to solve transparent object segmentation.
...
...