Proximal Policy Optimization Algorithms
- J. Schulman, F. Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
- Computer ScienceArXiv
- 20 July 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective…
Language Models are Few-Shot Learners
- Tom B. Brown, Benjamin Mann, Dario Amodei
- Computer ScienceNeural Information Processing Systems
- 28 May 2020
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Glow: Generative Flow with Invertible 1x1 Convolutions
- Diederik P. Kingma, Prafulla Dhariwal
- Computer ScienceNeural Information Processing Systems
- 9 July 2018
Glow, a simple type of generative flow using an invertible 1x1 convolution, is proposed, demonstrating that a generative model optimized towards the plain log-likelihood objective is capable of efficient realistic-looking synthesis and manipulation of large images.
Diffusion Models Beat GANs on Image Synthesis
- Prafulla Dhariwal, Alex Nichol
- Computer ScienceNeural Information Processing Systems
- 11 May 2021
It is shown that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models, and classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256 × 256 and 3.85 on imageNet 512 × 512.
Hierarchical Text-Conditional Image Generation with CLIP Latents
- A. Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
- Computer ScienceArXiv
- 13 April 2022
This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
Variational Lossy Autoencoder
- Xi Chen, Diederik P. Kingma, P. Abbeel
- Computer ScienceInternational Conference on Learning…
- 4 November 2016
This paper presents a simple but principled method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN with greatly improve generative modeling performance of VAEs.
Improved Denoising Diffusion Probabilistic Models
- Alex Nichol, Prafulla Dhariwal
- Computer ScienceInternational Conference on Machine Learning
- 18 February 2021
This work shows that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality, and finds that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality.
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
- Alex Nichol, Prafulla Dhariwal, Mark Chen
- Computer ScienceInternational Conference on Machine Learning
- 20 December 2021
This work explores diffusion models for the problem of text-conditional image synthesis and compares two different guidance strategies: CLIP guidance and classifier-free guidance, finding that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples.
Generative Pretraining From Pixels
- Mark Chen, Alec Radford, Ilya Sutskever
- Computer ScienceInternational Conference on Machine Learning
- 12 July 2020
This work trains a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure, and finds that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification.
Jukebox: A Generative Model for Music
- Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever
- Computer ScienceArXiv
- 30 April 2020
It is shown that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes, and can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.
...
...