Prompt-based Learning for Unpaired Image Captioning
@article{Zhu2022PromptbasedLF, title={Prompt-based Learning for Unpaired Image Captioning}, author={Peipei Zhu and Xiao Wang and Lin Zhu and Zhenglong Sun and Weishi Zheng and Yaowei Wang and Chang Wen Chen}, journal={ArXiv}, year={2022}, volume={abs/2205.13125} }
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs. Existing schemes usually adopt the visual concept reward of reinforcement learning to obtain the alignment between visual concepts and images. However, the cross-domain alignment is usually weak that severely constrains the overall performance of these existing schemes. Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of…
Figures and Tables from this paper
References
SHOWING 1-10 OF 58 REFERENCES
Exploring Semantic Relationships for Unpaired Image Captioning
- Computer ScienceArXiv
- 2021
This work achieves unpaired image captioning by bridging the vision and the language domains with high-level semantic information, and proposes the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image.
Unpaired Image Captioning With semantic-Constrained Self-Learning
- Computer ScienceIEEE Transactions on Multimedia
- 2022
A novel Semantic-Constrained Self-learning (SCS) framework that explores an iterative self-learning strategy to learn an image captioner with only unpaired image and text data and obtains the best published CIDEr score to-date.
Unsupervised Image Captioning
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This paper makes the first attempt to train an image captioning model in an unsupervised manner, and requires an image set, a sentence corpus, and an existing visual concept detector.
Microsoft COCO: Common Objects in Context
- Computer ScienceECCV
- 2014
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene…
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This paper addresses image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions by creating a shared latent space that is structured by visual concepts.
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
- Computer ScienceArXiv
- 2022
This is the first attempt to solve the problem of Weakly-Supervised visual concept recognition for UIC (WSUIC) based only on image-level labels and designs an unrecognized object (UnO) loss combined with a visual concept reward to improve the alignment of the inferred object and relationship information with the images.
Learning to Prompt for Continual Learning
- Computer ScienceArXiv
- 2021
This work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time, and achieves competitive results against rehearsal-based methods even without a rehearsal buffer.
Learning To Retrieve Prompts for In-Context Learning
- Computer ScienceNAACL
- 2022
This work proposes an efficient method for retrieving prompts for in-context learning using annotated data and an LM, and trains an efficient dense retriever from this data, which is used to retrieve training examples as prompts at test time.
OpenPrompt: An Open-source Framework for Prompt-learning
- Computer ScienceACL
- 2022
Open- Prompt is a unified easy-to-use toolkit to conduct prompt-learning over PLMs equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task for- mats, and prompting modules in a unified paradigm.
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
- Computer ScienceACL
- 2022
This work studies prompt-based low-resource learning of VL tasks with a sequence-to-sequence transformer model with prefix language modeling and masked language modeling, and observes that models with noisy prompts learn as quickly as hand-crafted prompts given larger training data.