• Corpus ID: 14989939

Conditional Image Generation with PixelCNN Decoders

@inproceedings{Oord2016ConditionalIG,
  title={Conditional Image Generation with PixelCNN Decoders},
  author={A{\"a}ron van den Oord and Nal Kalchbrenner and Lasse Espeholt and Koray Kavukcuoglu and Oriol Vinyals and Alex Graves},
  booktitle={NIPS},
  year={2016}
}
This work explores conditional image generation with a new image density model based on the PixelCNN architecture. [...] Key Result Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.Expand
Image Transformer
TLDR
This work generalizes a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood, and significantly increases the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks.
PixelVAE++: Improved PixelVAE with Discrete Prior
Constructing powerful generative models for natural images is a challenging task. PixelCNN models capture details and local information in images very well but have limited receptive field.
Generate High Resolution Images With Generative Variational Autoencoder
TLDR
This work replaces the decoder of VAE with a discriminator while using the encoder as it is and is able to combine the advantages of generative models and inference models in a principled bayesian manner.
Image Generation and Editing with Variational Info Generative AdversarialNetworks
TLDR
A new model called Variational InfoGAN (ViGAN) is proposed that is able to generate new images conditioned on visual descriptions and modify the image, by fixing the latent representation of image and varying the visual description.
Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling
TLDR
The Subscale Pixel Network (SPN) is proposed, a conditional decoder architecture that generates an image as a sequence of sub-images of equal size that compactly captures image-wide spatial dependencies and requires a fraction of the memory and the computation required by other fully autoregressive models.
Text-to-Image Generation with Attention Based Recurrent Neural Networks
TLDR
A tractable and stable caption-based image generation model that uses an attention-based encoder to learn word-to-pixel dependencies and a conditional autoregressive based decoder for generating images.
Hierarchical Autoregressive Image Models with Auxiliary Decoders
TLDR
It is shown that autoregressive models conditioned on discrete representations of images which abstract away local detail can produce high-fidelity reconstructions of images, and that they can be trained on to produce samples with large-scale coherence.
Object Estimator Image Decoder Object Encoder Objects Fuser
Despite significant recent progress on generative models, controlled generation of images depicting multiple and complex object layouts is still a difficult problem. Among the core challenges are the
PixelGAN Autoencoders
TLDR
It is shown how the PixelGAN autoencoder with a categorical prior can be directly used in semi-supervised settings and achieve competitive semi- supervised classification results on the MNIST, SVHN and NORB datasets.
EdiBERT, a generative model for image editing
TLDR
This paper proposes EdiBERT, a bi-directional transformer trained in the discrete latent space built by a vector-quantized auto-encoder, and argues that such a bidirectional model is suited for image manipulation since any patch can be re-sampled conditionally to the whole image.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Generative Image Modeling Using Spatial LSTMs
TLDR
This work introduces a recurrent image model based on multidimensional long short-term memory units which is particularly suited for image modeling due to their spatial structure and outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.
Generative Adversarial Text to Image Synthesis
TLDR
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels.
Pixel Recurrent Neural Networks
TLDR
A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
TLDR
A generative parametric model capable of producing high quality samples of natural images using a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion.
Locally-connected transformations for deep GMMs
TLDR
This work extends and applies Deep Gaussian Mixture Models (DGMMs) to this task, by introducing locally connected transformations and shows the benefits of using locally-connected Deep GMMs and gives new insights on modeling higher dimensional images.
The student-t mixture as a natural image patch prior with application to image compression
TLDR
This work demonstrates that the Student-t mixture model convincingly surpasses GMMs in terms of log likelihood, achieving performance competitive with the state of the art in image patch modeling, and proposes efficient coding schemes that can easily be extended to other unsupervised machine learning models.
DRAW: A Recurrent Neural Network For Image Generation
TLDR
The Deep Recurrent Attentive Writer neural network architecture for image generation substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.
Factoring Variations in Natural Images with Deep Gaussian Mixture Models
TLDR
This paper proposes a new scalable deep generative model for images, called the Deep Gaussian Mixture Model, that is a straightforward but powerful generalization of GMMs to multiple layers, and shows that deeper GMM architectures generalize better than more shallow ones.
Generating Images from Captions with Attention
TLDR
It is demonstrated that the proposed model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.
A Neural Algorithm of Artistic Style
TLDR
This work introduces an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality and offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
...
1
2
3
4
...