• Corpus ID: 235421596

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

@inproceedings{Jiang2021TransGANTP,
  title={TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up},
  author={Yi-fan Jiang and Shiyu Chang and Zhangyang Wang},
  booktitle={NeurIPS},
  year={2021}
}
The recent explosive interest on transformers has suggested their potential to become powerful “universal" models for computer vision tasks, such as classification, detection, and segmentation. While those attempts mainly study the discriminative models, we explore transformers on some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs). Our goal is to conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer… 

Figures and Tables from this paper

Countering Malicious DeepFakes: Survey, Battleground, and Horizon
TLDR
A comprehensive overview and detailed analysis of the research work on the topic of DeepFake generation, DeepFake detection as well as evasion of Deepfake detection, with more than 318 research papers carefully surveyed is provided.
Penetration Multilayer Overload Signal Generation Based on TransGAN
  • Anqi Fang, Rong Li
  • Engineering, Computer Science
    Journal of Physics: Conference Series
  • 2022
TLDR
Experimental results show that the TransGAN-based penetration multi-layer overload signal generation method can generate effective overload data with a different number of layers, which can address the issue of the lack of penetration multilayer overload signals to a certain extent.
TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network
TLDR
TTS-GAN is introduced, a transformer-based GAN which can successfully generate realistic synthetic time-series data sequences of arbitrary length, similar to the real ones, using a pure transformer encoder architecture.
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
TLDR
This paper establishes a rigorous theory framework to analyze ViT features from the Fourier spectrum domain, and shows that the self-attention mechanism inherently amounts to a low-pass filter, which indicates when ViT scales up its depth, excessive low- pass filtering will cause feature maps to only preserve their Direct-Current component.
A Universal Detection Method for Adversarial Examples and Fake Images
TLDR
Experimental results show that the proposed framework has good feasibility and effectiveness in detecting adversarial examples or fake images, and has good generalizability for the different datasets and model structures.
A review on AI in PET imaging
TLDR
This article reviews studies that applied deep learning techniques for image generation on PET and categorized the studies into three themes: recovering full PET data from noisy data by denoising with deep learning, PET image reconstruction and attenuation correction withdeep learning and PET image translation and synthesis withDeep learning.
ActiveMLP: An MLP-like Architecture with Active Token Mixer
TLDR
This paper presents ActiveMLP, a general MLP-like backbone for computer vision, and proposes an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given one.
Auto-scaling Vision Transformers without Training
TLDR
As-ViT is proposed, an auto-scaling framework for ViTs without training, which automatically discovers and scales up ViTs in an efficient and principled manner and proposes a progressive tokenization strategy to train ViTs faster and cheaper.
End-to-End Instance Edge Detection
TLDR
A novel transformer architecture that efficiently combines a FPN and a transformer decoder to enable cross attention on multi-scale high resolution feature maps within a reasonable computation budget is designed and a light weight dense prediction head is proposed that is applicable to both instance edge and mask detection.
Eye Gaze and Self-attention: How Humans and Transformers Attend Words in Sentences
Attention describes cognitive processes that are important to many human phenomena including reading. The term is also used to describe the way in which transformer neural networks perform natural
...
1
2
3
4
...

References

SHOWING 1-10 OF 85 REFERENCES
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
TLDR
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
TLDR
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
TLDR
This work proposes Nyströmformer - a model that exhibits favorable scalability as a function of sequence length and performs favorably relative to other efficient self-attention methods.
Analyzing and Improving the Image Quality of StyleGAN
TLDR
This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.
Differentiable augmentation for dataefficient gan training
  • arXiv preprint arXiv:2006.10738,
  • 2020
Progressive Growing of GANs for Improved Quality, Stability, and Variation
TLDR
A new training methodology for generative adversarial networks is described, starting from a low resolution, and adding new layers that model increasingly fine details as training progresses, allowing for images of unprecedented quality.
Colorization Transformer
TLDR
The Colorization Transformer is presented, a novel approach for diverse high fidelity image colorization based on self-attention that outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test.
Taming Transformers for High-Resolution Image Synthesis
TLDR
It is demonstrated how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.
A Style-Based Generator Architecture for Generative Adversarial Networks
TLDR
An alternative generator architecture for generative adversarial networks is proposed, borrowing from style transfer literature, that improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.
...
1
2
3
4
5
...