• Corpus ID: 239016352

Vit-GAN: Image-to-image Translation with Vision Transformes and Conditional GANS

  title={Vit-GAN: Image-to-image Translation with Vision Transformes and Conditional GANS},
  author={Yiugit Gunducc},
In this paper, we have developed a general-purpose architecture, Vit-Gan, capable of performing most of the image-to-image translation tasks from semantic image segmentation to single image depth perception. This paper is a follow-up paper, an extension of generator based model [1] in which the obtained results were very promising. This opened the possibility of further improvements with adversarial architecture. We used a unique vision transformers-based generator architecture and Conditional… 

Figures and Tables from this paper


Image-to-Image Translation with Conditional Adversarial Networks
Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Improved Techniques for Training GANs
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.
Tensor-to-Image: Image-to-Image Translation with Vision Transformers
A vision transformerbased custom-designed model, tensor-to-image, for the image to image translation with the help of self-attention was able to generalize and apply to different problems without a single modification.
Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks
Markovian Generative Adversarial Networks (MGANs) are proposed, a method for training generative networks for efficient texture synthesis that surpasses previous neural texture synthesizers by a significant margin and applies to texture synthesis, style transfer, and video stylization.
Colorful Image Colorization
This paper proposes a fully automatic approach to colorization that produces vibrant and realistic colorizations and shows that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder.
Conditional Generative Adversarial Nets
The conditional version of generative adversarial nets is introduced, which can be constructed by simply feeding the data, y, to the generator and discriminator, and it is shown that this model can generate MNIST digits conditioned on class labels.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.
U-Net: Convolutional Networks for Biomedical Image Segmentation
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
VisualBERT: A Simple and Performant Baseline for Vision and Language
Analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a