• Publications
  • Influence
Generative Image Inpainting with Contextual Attention
TLDR
This work proposes a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.
Free-Form Image Inpainting With Gated Convolution
TLDR
The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers.
Universal Style Transfer via Feature Transforms
TLDR
The key ingredient of the method is a pair of feature transforms, whitening and coloring, that are embedded to an image reconstruction network that reflects a direct matching of feature covariance of the content image to a given style image.
MAttNet: Modular Attention Network for Referring Expression Comprehension
TLDR
This work proposes to decompose expressions into three modular components related to subject appearance, location, and relationship to other objects, which allows for flexibly adapt to expressions containing different types of information in an end-to-end framework.
Decomposing Motion and Content for Natural Video Sequence Prediction
TLDR
To the best of the knowledge, this is the first end-to-end trainable network architecture with motion and content separation to model the spatiotemporal dynamics for pixel-level future prediction in natural videos.
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
TLDR
An encoder-decoder network with a novel projection loss defined by the projective transformation enables the unsupervised learning using 2D observation without explicit 3D supervision and shows superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.
Attribute2Image: Conditional Image Generation from Visual Attributes
TLDR
A layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder is developed and shows excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.
Deep Interactive Object Selection
TLDR
This paper presents a novel deep-learning-based algorithm which has much better understanding of objectness and can reduce user interactions to just a few clicks and is superior to all existing interactive object selection approaches.
Generative Face Completion
TLDR
This paper demonstrates qualitatively and quantitatively that the proposed effective face completion algorithm is able to deal with a large area of missing pixels in arbitrary shapes and generate realistic face completion results.
Salient Color Names for Person Re-identification
TLDR
This paper proposes a novel salient color names based color descriptor (SCNCD) to describe colors that outperforms the state-of-the-art performance (without user’s feedback optimization) on two challenging datasets (VIPeR and PRID 450S).
...
...