Prompt-based Learning for Unpaired Image Captioning

@article{Zhu2022PromptbasedLF,
  title={Prompt-based Learning for Unpaired Image Captioning},
  author={Peipei Zhu and Xiao Wang and Lin Zhu and Zhenglong Sun and Weishi Zheng and Yaowei Wang and Chang Wen Chen},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.13125}
}
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs. Existing schemes usually adopt the visual concept reward of reinforcement learning to obtain the alignment between visual concepts and images. However, the cross-domain alignment is usually weak that severely constrains the overall performance of these existing schemes. Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of… 

References

SHOWING 1-10 OF 58 REFERENCES
Exploring Semantic Relationships for Unpaired Image Captioning
TLDR
This work achieves unpaired image captioning by bridging the vision and the language domains with high-level semantic information, and proposes the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image.
Unpaired Image Captioning With semantic-Constrained Self-Learning
TLDR
A novel Semantic-Constrained Self-learning (SCS) framework that explores an iterative self-learning strategy to learn an image captioner with only unpaired image and text data and obtains the best published CIDEr score to-date.
Unsupervised Image Captioning
  • Yang FengLin MaWei LiuJiebo Luo
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
This paper makes the first attempt to train an image captioning model in an unsupervised manner, and requires an image set, a sentence corpus, and an existing visual concept detector.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings
TLDR
This paper addresses image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions by creating a shared latent space that is structured by visual concepts.
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
TLDR
This is the first attempt to solve the problem of Weakly-Supervised visual concept recognition for UIC (WSUIC) based only on image-level labels and designs an unrecognized object (UnO) loss combined with a visual concept reward to improve the alignment of the inferred object and relationship information with the images.
Learning to Prompt for Continual Learning
TLDR
This work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time, and achieves competitive results against rehearsal-based methods even without a rehearsal buffer.
Learning To Retrieve Prompts for In-Context Learning
TLDR
This work proposes an efficient method for retrieving prompts for in-context learning using annotated data and an LM, and trains an efficient dense retriever from this data, which is used to retrieve training examples as prompts at test time.
OpenPrompt: An Open-source Framework for Prompt-learning
TLDR
Open- Prompt is a unified easy-to-use toolkit to conduct prompt-learning over PLMs equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task for- mats, and prompting modules in a unified paradigm.
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
TLDR
This work studies prompt-based low-resource learning of VL tasks with a sequence-to-sequence transformer model with prefix language modeling and masked language modeling, and observes that models with noisy prompts learn as quickly as hand-crafted prompts given larger training data.
...
...