Combining Transformer Generators with Convolutional Discriminators

  title={Combining Transformer Generators with Convolutional Discriminators},
  author={Ricard Durall and Stanislav Frolov and Andreas R. Dengel and Janis Keuper},
Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only transformer-based architectures and achieves competitive results when compared to convolutional… 
ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation
This work presents a GAN Transformer framework for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions, and demonstrates adaptability to various human motion representations.


TransGAN: Two Transformers Can Make One Strong GAN
This first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures is conducted, and the best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones.
Image Transformer
This work generalizes a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood, and significantly increases the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks.
Self-Attention Generative Adversarial Networks
The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
AutoGAN: Neural Architecture Search for Generative Adversarial Networks
This paper presents the first preliminary study on introducing the NAS algorithm to generative adversarial networks (GANs), dubbed AutoGAN, and discovers architectures that achieve highly competitive performance compared to current state-of-the-art hand-crafted GANs.
Large Scale GAN Training for High Fidelity Natural Image Synthesis
It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input.
Generative Adversarial Transformers
The GANformer is introduced, a novel and efficient type of transformer, and explored for the task of visual generative modeling that utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a multi-latent generalization of the successful StyleGAN network.
Improved Techniques for Training GANs
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.
Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions
This paper proposes to add a novel spectral regularization term to the training optimization objective and shows that this approach not only allows to train spectral consistent GANs that are avoiding high frequency errors but also shows that a correct approximation of the frequency spectrum has positive effects on the training stability and output quality of generative networks.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.