• Publications
  • Influence
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
TLDR
An end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image.
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
TLDR
The proposed Soft Filter Pruning (SFP) method enables the pruned filters to be updated when training the model after pruning, which has two advantages over previous works: larger model capacity and less dependence on the pretrained model.
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
TLDR
This paper deploys a pure transformer to encode an image as a sequence of patches, termed SEgmentation TRansformer (SETR), and shows that SETR achieves new state of the art on ADE20K, Pascal Context, and competitive results on Cityscapes.
Transductive Multi-View Zero-Shot Learning
TLDR
A novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space that rectifies the projection shift between the auxiliary and target domains, exploits the complementarity of multiple semantic representations, and significantly outperforms existing methods for both zero- shot and N-shot recognition.
Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation
TLDR
This paper proposes a novel framework, transductive multi-view embedding, that rectifies the projection shift between the auxiliary and target domains, exploits the complementarity of multiple semantic representations, achieves state-of-the-art recognition results on image and video benchmark datasets, and enables novel cross-view annotation tasks.
Multi-View Video Summarization
TLDR
A spatio-temporal shot graph is constructed and the summarization problem is formulated as a graph labeling task, which encodes the correlations with different attributes among multi-view video shots in hyperedges and generates a result based on shot importance evaluated using a Gaussian entropy fusion scheme.
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
TLDR
In this dataset, rich annotations bridge the semantic gap between low-level images and high-level concepts, and is an effective benchmark to evaluate and improve different computational methods.
Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking
TLDR
The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective, setting new MOTA records on MOT16 and MOT17 challenge datasets (67.6 and 66.6, respectively), without relying on any extra training data.
Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation
TLDR
This model learns to predict series of deformations to improve a coarse shape iteratively and exhibits generalization capability across different semantic categories, number of input images, and quality of mesh initialization.
Pose-Normalized Image Generation for Person Re-identification
TLDR
This work proposes a novel deep person image generation model for synthesizing realistic person images conditional on the pose based on a generative adversarial network designed specifically for pose normalization in re-id, thus termed pose-normalization GAN (PN-GAN).
...
...