nocaps: novel object captioning at scale

@article{Agrawal2019nocapsNO,
  title={nocaps: novel object captioning at scale},
  author={Harsh Agrawal and Karan Desai and Yufei Wang and Xinlei Chen and Rishabh Jain and Mark Johnson and Dhruv Batra and Devi Parikh and Stefan Lee and Peter Anderson},
  journal={International Conference on Computer Vision},
  year={2019},
  pages={8947-8956}
}
Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale… Expand
Partially-supervised novel object captioning leveraging context from paired data
In this paper, we propose an approach to improve image captioning solutions for images with novel objects that do not have caption labels in the training dataset. Our approach is agnostic to modelExpand
VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training
TLDR
The results show that the VIsual VOcabulary pre-training model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Expand
Learning to Select: A Fully Attentive Approach for Novel Object Captioning
TLDR
This paper presents a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly. Expand
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
TLDR
The results show that the VIsual VOcabulary pretraining (VIVO) model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Expand
Leveraging Human Attention in Novel Object Captioning
  • Xianyu Chen, Ming Jiang, Qi Zhao
  • Computer Science
  • IJCAI
  • 2021
Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previousExpand
Self-Distillation for Few-Shot Image Captioning
  • Xianyu Chen, Ming Jiang, Qi Zhao
  • Computer Science
  • 2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
  • 2021
TLDR
An ensemble- based self-distillation method that allows image captioning models to be trained with unpaired images and captions and a simple yet effective pseudo feature generation method based on Gradient Descent is proposed. Expand
Captioning Images with Novel Objects via Online Vocabulary Expansion
TLDR
A method is proposed that can explain images with novel objects without retraining using the word embeddings of the objects estimated from only a small number of image features of theObjects. Expand
ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning
TLDR
The ECOL-R model is proposed, a copy-augmented transformer model that is encouraged to accurately describe the novel object labels via a specialised reward function in the SCST reinforcement learning framework that encourages novel object mentions while maintaining the caption quality. Expand
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
TLDR
This paper attempts to show the biased nature of the currently existing image captioning models and presents a newimage captioning dataset, Egoshots, consisting of 978 real life images with no captions, and proposes a new image Captioning metric, object based Semantic Fidelity (SF). Expand
A Meta Learning Approach to Novel Image Captioning
Abstract to the abstract? This is weird lol ...to the abstract? This is weird lol ... Image captioning for images that contain novel objects not seen in training data is a difficult yet very valuableExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 59 REFERENCES
Captioning Images with Diverse Objects
TLDR
The Novel Object Captioner (NOC) is proposed, a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets, taking advantage of external sources, labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. Expand
Partially-Supervised Image Captioning
TLDR
This work proposes a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which it represents using finite state automata and shows that it can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores. Expand
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
TLDR
The Deep Compositional Captioner (DCC) is proposed to address the task of generating descriptions of novel objects which are not present in paired imagesentence datasets by leveraging large object recognition datasets and external text corpora and by transferring knowledge between semantically similar concepts. Expand
Guided Open Vocabulary Image Captioning with Constrained Beam Search
TLDR
This work uses constrained beam search to force the inclusion of selected tag words in the output, and fixed, pretrained word embeddings to facilitate vocabulary expansion to previously unseen tag words to achieve state of the art results for out-of- domain captioning on MSCOCO (and improved results for in-domain captioning). Expand
Decoupled Novel Object Captioner
TLDR
The Decoupled Novel Object Captioner (DNOC) framework is proposed that can fully decouple the language sequence model from the object descriptions and the experimental results on the held-out MSCOCO dataset demonstrate the ability of DNOC in describing novel concepts. Expand
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
TLDR
A new architecture that incorporates copying into the Convolutional Neural Networks plus Recurrent Neural Networks (RNN) image captioning framework, for describing novel objects in captions, and superior results are reported when compared to state-of-the-art deep models. Expand
Rich Image Captioning in the Wild
  • Kenneth Tran, X. He, Lei Zhang, Jian Sun
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2016
TLDR
An image caption system that addresses new challenges of automatically describing images in the wild by developing a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Expand
Neural Baby Talk
TLDR
A novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image is introduced and reaches state-of-the-art on both COCO and Flickr30k datasets. Expand
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider varietyExpand
From captions to visual concepts and back
TLDR
This paper uses multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives, and develops a maximum-entropy language model. Expand
...
1
2
3
4
5
...