Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
@article{Ko2022LargescaleTG, title={Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works}, author={Hyung-Kwon Ko and Gwanmo Park and Hyeon Jeon and Jaemin Jo and Juho Kim and Jinwook Seo}, journal={ArXiv}, year={2022}, volume={abs/2210.08477} }
Large-scale Text-to-image Generation Models (LTGMs) (e.g., DALL-E), self-supervised deep learning models trained on a huge dataset, have demonstrated the capacity for generating high-quality open-domain images from multi-modal input. Although they can even produce anthropomorphized versions of objects and animals, com-bine irrelevant concepts in reasonable ways, and give variation to any user-provided images, we witnessed such rapid technological advancement left many visual artists disoriented…
References
SHOWING 1-10 OF 103 REFERENCES
LAION-5B: An open large-scale dataset for training next generation image-text models
- Computer ScienceArXiv
- 2022
This work presents LAION-5B a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language, and shows successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discusses further experiments enabled with an openly available dataset of this scale.
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
- Computer ScienceCHI
- 2022
A study exploring what prompt keywords and model hyperparameters can help produce coherent outputs from text-to-image generative models, structured to include subject and style keywords and investigates success and failure modes of these prompts.
Initial Images: Using Image Prompts to Improve Subject Representation in Multimodal AI Generated Art
- ArtCreativity & Cognition
- 2022
Advances in text-to-image generative models have made it easier for people to create art by just prompting models with text. However, creating through text leaves users with limited control over the…
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
- Computer ScienceArXiv
- 2022
This work uses only 3 - 5 images of a user-provided concept to represent it through new “words” in the embedding space of a frozen text-to-image model, which can be composed into natural language sentences, guiding personalized creation in an intuitive way.
Opal: Multimodal Image Generation for News Illustration
- Computer ScienceUIST
- 2022
How structured exploration can help users better understand the capabilities of human AI co-creative systems is discussed, and Opal, a system that produces text-to-image generations for news illustration, is addressed.
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- Computer ScienceArXiv
- 2022
The Pathways Autoregressive Text-to-Image (Parti) model is presented, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge and explores and highlights limitations of the models.
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
- Computer ScienceECCV
- 2022
A novel text-to-image method that addresses gaps by enabling a simple control mechanism complementary to text in the form of a scene, and introducing elements that substantially improve the tokenization process by employing domain-specific knowledge over key image regions (faces and salient objects).
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers
- Computer ScienceArXiv
- 2022
It is shown that recent text-to-image generative transformer models perform better in recognizing and counting objects than recognizing colors and understanding spatial relations, while there exists a large gap between the model performances and upper bound accuracy on all skills.
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
- Computer ScienceECCV
- 2022
Compared to several strong baselines, N¨UWA achieves state-of-the-art results on text-to-image generation, text- to-video generation, video prediction, etc, and shows surprisingly good zero-shot capabilities on text and video manipulation tasks.
TaleBrush: Sketching Stories with Generative Pretrained Language Models
- Computer ScienceCHI
- 2022
TaleBrush is introduced, a generative story ideation tool that uses line sketching interactions with a GPT-based language model for control and sensemaking of a protagonist’s fortune in co-created stories and a reflection on how Sketching interactions can facilitate the iterative human-AI co-creation process.