Image Transformer
@article{Parmar2018ImageT, title={Image Transformer}, author={Niki Parmar and Ashish Vaswani and Jakob Uszkoreit and Lukasz Kaiser and Noam M. Shazeer and Alexander Ku and Dustin Tran}, journal={ArXiv}, year={2018}, volume={abs/1802.05751} }
Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. [] Key Result In a human evaluation study, we show that our super-resolution models improve significantly over previously published autoregressive super-resolution models. Images they generate fool human observers three times more often than the previous state of the art.
593 Citations
MaskGIT: Masked Generative Image Transformer
- Computer Science
- 2022
This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which it is demonstrated that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 64x.
Attention Augmented Convolutional Networks
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
It is found that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar.
Improved Transformer for High-Resolution GANs
- Computer ScienceNeurIPS
- 2021
The proposed HiT is an important milestone for generators in GANs which are completely free of convolutions and has a nearly linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images.
Vision Transformer with Progressive Sampling
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
An iterative and progressive sampling strategy to locate discriminative regions and when combined with the Vision Transformer, the obtained PS-ViT network can adaptively learn where to look.
Self-Attention Generative Adversarial Networks
- Computer ScienceICML
- 2019
The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset.
Locally Masked Convolution for Autoregressive Models
- Computer ScienceUAI
- 2020
LMConv is introduced: a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image, achieving improved performance on whole-image density estimation and globally coherent image completions.
Face sketch-to-photo transformation with multi-scale self-attention GAN
- Computer ScienceNeurocomputing
- 2020
ViViT: A Video Vision Transformer
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work shows how to effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets, and achieves state-of-the-art results on multiple video classification benchmarks.
XCiT: Cross-Covariance Image Transformers
- Computer ScienceNeurIPS
- 2021
This work proposes a “transposed” version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries, and has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
Vector-quantized Image Modeling with Improved VQGAN
- Computer ScienceArXiv
- 2021
A Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively, and proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity.
References
SHOWING 1-10 OF 25 REFERENCES
Conditional Image Generation with PixelCNN Decoders
- Computer ScienceNIPS
- 2016
The gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
Generative Image Modeling Using Spatial LSTMs
- Computer ScienceNIPS
- 2015
This work introduces a recurrent image model based on multidimensional long short-term memory units which is particularly suited for image modeling due to their spatial structure and outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.
PixelSNAIL: An Improved Autoregressive Generative Model
- Computer ScienceICML
- 2018
This work introduces a new generative model architecture that combines causal convolutions with self attention and presents state-of-the-art log-likelihood results on CIFAR-10 and ImageNet.
The student-t mixture as a natural image patch prior with application to image compression
- Computer ScienceJ. Mach. Learn. Res.
- 2014
This work demonstrates that the Student-t mixture model convincingly surpasses GMMs in terms of log likelihood, achieving performance competitive with the state of the art in image patch modeling, and proposes efficient coding schemes that can easily be extended to other unsupervised machine learning models.
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.
Attention is All you Need
- Computer ScienceNIPS
- 2017
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
Generating Images from Captions with Attention
- Computer ScienceICLR
- 2016
It is demonstrated that the proposed model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
- Computer ScienceICLR
- 2016
This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.
BEGAN: Boundary Equilibrium Generative Adversarial Networks
- Computer ScienceArXiv
- 2017
This work proposes a new equilibrium enforcing method paired with a loss derived from the Wasserstein distance for training auto-encoder based Generative Adversarial Networks, which provides a new approximate convergence measure, fast and stable training and high visual quality.