• Corpus ID: 49585114


  author={Andrej Karpathy},
  • A. Karpathy
  • Published 2016
  • Computer Science, Environmental Science
PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on… 

Figures and Tables from this paper


Conditional Image Generation with PixelCNN Decoders
The gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
Locally-connected transformations for deep GMMs
This work extends and applies Deep Gaussian Mixture Models (DGMMs) to this task, by introducing locally connected transformations and shows the benefits of using locally-connected Deep GMMs and gives new insights on modeling higher dimensional images.
Generative Image Modeling Using Spatial LSTMs
This work introduces a recurrent image model based on multidimensional long short-term memory units which is particularly suited for image modeling due to their spatial structure and outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.
U-Net: Convolutional Networks for Biomedical Image Segmentation
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Pixel Recurrent Neural Networks
A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
NICE: Non-linear Independent Components Estimation
We propose a deep learning framework for modeling complex high-dimensional densities called Non-linear Independent Component Estimation (NICE). It is based on the idea that a good representation is
Wide Residual Networks
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.
Video Pixel Networks
A probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video and generalizes to the motion of novel objects.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Towards Conceptual Compression
A simple recurrent variational auto-encoder architecture that significantly improves image modeling and shows that it naturally separates global conceptual information from lower level details, thus addressing one of the fundamentally desired properties of unsupervised learning.