• Publications
  • Influence
StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation
TLDR
A unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network, which leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain.
StarGAN v2: Diverse Image Synthesis for Multiple Domains
TLDR
StarGAN v2, a single framework that tackles image-to-image translation models with limited diversity and multiple models for all domains, is proposed and shows significantly improved results over the baselines.
Overcoming Catastrophic Forgetting by Incremental Moment Matching
TLDR
IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively to make the search space of posterior parameter smooth.
Dual Attention Networks for Multimodal Reasoning and Matching
TLDR
This work proposes Dual Attention Networks which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language and introduces two types of DANs for multimodal reasoning and matching, respectively.
Photorealistic Style Transfer via Wavelet Transforms
TLDR
This work proposes a wavelet corrected transfer based on whitening and coloring transforms (WCT2) that allows features to preserve their structural information and statistical properties of VGG feature space during stylization and provides a stable video stylization without temporal constraints.
DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder
TLDR
DialogWAE is proposed, a conditional Wasserstein autoencoder specially designed for dialogue modeling that models the distribution of data by training a GAN within the latent variable space and develops a Gaussian mixture prior network to enrich the latent space.
Multimodal Residual Learning for Visual QA
TLDR
This work presents Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning.
Phase-aware Speech Enhancement with Deep Complex U-Net
TLDR
A novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure and achieves state-of-the-art performance in all metrics.
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
TLDR
This paper proposes a simple and effective remedy, SGDP and AdamP: get rid of the radial component, or the norm-increasing direction, at each optimizer step, which alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.
Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs
TLDR
This paper proposes a novel self-supervised auxiliary learning method using meta-paths, which are composite relations of multiple edge types, which can be viewed as a type of meta-learning to learn graph neural networks on heterogeneous graphs.
...
...