Language Models are Few-Shot Learners
- Tom B. Brown, Benjamin Mann, Dario Amodei
- Computer ScienceNeural Information Processing Systems
- 28 May 2020
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Zero-Shot Text-to-Image Generation
- A. Ramesh, Mikhail Pavlov, Ilya Sutskever
- Computer ScienceInternational Conference on Machine Learning
- 24 February 2021
This work describes a simple approach based on a transformer that autoregressively models the text and image tokens as a single stream of data that is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
Evaluating Large Language Models Trained on Code
- Mark Chen, Jerry Tworek, Wojciech Zaremba
- Computer ScienceArXiv
- 7 July 2021
It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.
Hierarchical Text-Conditional Image Generation with CLIP Latents
- A. Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
- Computer ScienceArXiv
- 13 April 2022
This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
- Alex Nichol, Prafulla Dhariwal, Mark Chen
- Computer ScienceInternational Conference on Machine Learning
- 20 December 2021
This work explores diffusion models for the problem of text-conditional image synthesis and compares two different guidance strategies: CLIP guidance and classifier-free guidance, finding that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples.
Generative Pretraining From Pixels
- Mark Chen, Alec Radford, Ilya Sutskever
- Computer ScienceInternational Conference on Machine Learning
- 12 July 2020
This work trains a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure, and finds that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification.
Data and Parameter Scaling Laws for Neural Machine Translation
- Prafulla Dhariwal, Girish Sastry, Jonathan Deaton
- Computer Science
- 2021
We observe that the development cross-entropy loss of supervised neural machine translation models scales like a power law with the amount of training data and the number of non-embedding parameters…
Scaling Laws for Autoregressive Generative Modeling
- T. Henighan, J. Kaplan, Sam McCandlish
- Computer ScienceArXiv
- 28 October 2020
The case that scaling laws have important implications for neural network performance, including on downstream tasks is strengthened, as empirical scaling laws for the cross-entropy loss are identified.
Distribution Augmentation for Generative Modeling
- Heewoo Jun, Rewon Child, Ilya Sutskever
- Computer ScienceInternational Conference on Machine Learning
- 2020
DistAug is presented, a simple and powerful method of regularizing generative models that applies augmentation functions to data and conditions the generative model on the specific function used, enabling aggressive augmentations more commonly seen in supervised and self-supervised learning.
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
- Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, Mark Chen
- Computer ScienceArXiv
- 16 December 2022
This paper explores an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU, and is one to two orders of mag-nitude faster to sample from, offering a practical trade-off for some use cases.
...
...