Rethinking the Truly Unsupervised Image-to-Image Translation

  title={Rethinking the Truly Unsupervised Image-to-Image Translation},
  author={Kyungjune Baek and Yunjey Choi and Youngjung Uh and Jaejun Yoo and Hyunjung Shim},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Every recent image-to-image translation model inherently requires either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision. However, even set-level supervision can be a severe bottleneck for data collection in practice. In this paper, we tackle image-to-image translation in a fully unsupervised setting, i.e., neither paired images nor domain labels. To this end, we propose a truly unsupervised image-to-image translation model (TUNIT) that simultaneously learns… 

Contrastive Learning for Unsupervised Image-to-Image Translation

An unsupervised image-to-image translation method based on contrastive learning to learn a discriminator that differentiates between distinctive styles and let the discriminator supervise a generator to transfer those styles across images.

Leveraging Local Domains for Image-to-Image Translation

This paper leverages human knowledge about spatial domain characteristics which it refers to as ’local domains’ and demonstrates its benevolence for image-to-image translation and shows that all tested proxy tasks are significantly improved, without ever seeing target domain at training.

Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation

A new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space in which both intra- and inter-domain interpolations correspond to gradual changes in the generated images and the content of the source image is better preserved during the translation.

LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

A LANguage-driven Image-to-image Translation model, dubbed LANIT, that achieves comparable or superior performance to existing models and introduces a slack domain to cover samples that are not covered by the candidate domains.

Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes

This paper explores multiple frameworks that rely on different paradigms and assess how one of these that has initially been developed for single object translation performs on more diverse and content-rich images.

A Style-aware Discriminator for Controllable Image Translation

A style-aware discriminator that acts as a critic as well as a style encoder to provide conditions and learns a controllable style space using prototype-based self-supervised learning and simultaneously guides the generator.

Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC), which enforces a spatial perturgation function and the translation operator to be commutative (i.e., T ◦ G = G ◦ T ).

Multimodal Image-to-Image Translation via a Single Generative Adversarial Network

Qualitative and quantitative results over a wide range of datasets against several counterparts and variants of the SoloGAN model demonstrate the merits of the method, especially for the challenging I2I translation tasks, i.e., tasks that involve extreme shape variations or need to keep the complex backgrounds unchanged after translations.

Exploring Negatives in Contrastive Learning for Unpaired Image-to-Image Translation

A new negative Pruning technology for Unpaired image-to-image Translation (PUT) by sparsifying and ranking the patches is introduced and the proposed algorithm is efficient, flexible and enables the model to learn essential information between corresponding patches stably.

The Spatially-Correlative Loss for Various Image Translation Tasks

This work proposes a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation, and introduces a new self-supervised learning method to explicitly learn spatially -correlative maps for each specific translation task.



Semi-Supervised Learning for Few-Shot Image-to-Image Translation

This work proposes applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure, and applies a cycle consistency constraint to further exploit the information from unlabeled images, either from the same dataset or external.

Unsupervised Image-to-Image Translation Networks

This work makes a shared-latent space assumption and proposes an unsupervised image-to-image translation framework based on Coupled GANs that achieves state-of-the-art performance on benchmark datasets.

Multimodal Unsupervised Image-to-Image Translation

A Multimodal Unsupervised Image-to-image Translation (MUNIT) framework that assumes that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties.

Toward Multimodal Image-to-Image Translation

This work aims to model a distribution of possible outputs in a conditional generative modeling setting that helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse.

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

This work presents an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples, and introduces a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).

One-Shot Unsupervised Cross Domain Translation

This work argues that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and presents empirical evidence that the existing unsupervised domain translation methods fail on this task.

Diverse Image-to-Image Translation via Disentangled Representations

This work presents an approach based on disentangled representation for producing diverse outputs without paired training images, and proposes to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and adomain-specific attribute space.

StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation

A unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network, which leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain.

Exploring Unlabeled Faces for Novel Attribute Discovery

This work uses prior knowledge about the visual world as guidance to discover novel attributes and transfer them via a novel normalization method, and shows that the method trained on unlabeled data produces high-quality translations, preserves identity, and be perceptually realistic.

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

This work presents an approach based on disentangled representation for generating diverse outputs without paired training images that can generate diverse and realistic images on a wide range of tasks without pairedTraining data.