Share This Author
WaveNet: A Generative Model for Raw Audio
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Representation Learning with Contrastive Predictive Coding
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Neural Discrete Representation Learning
Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
Pixel Recurrent Neural Networks
A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
Conditional Image Generation with PixelCNN Decoders
- Aäron van den Oord, Nal Kalchbrenner, Lasse Espeholt, K. Kavukcuoglu, Oriol Vinyals, A. Graves
- Computer ScienceNIPS
- 16 June 2016
The gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
Generating Diverse High-Fidelity Images with VQ-VAE-2
It is demonstrated that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous…
Efficient Neural Audio Synthesis
A single-layer recurrent neural network with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model, the WaveRNN, and a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once.
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
- J. Uesato, Brendan O'Donoghue, Aäron van den Oord, Pushmeet Kohli
- Computer ScienceICML
- 15 February 2018
This paper motivates the use of adversarial risk as an objective, although it cannot easily be computed exactly, and frames commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarialrisk.
On Variational Bounds of Mutual Information
- Ben Poole, Sherjil Ozair, Aäron van den Oord, Alexander A. Alemi, G. Tucker
- Computer ScienceICML
- 16 May 2019
This work introduces a continuum of lower bounds that encompasses previous bounds and flexibly trades off bias and variance and demonstrates the effectiveness of these new bounds for estimation and representation learning.