• Publications
  • Influence
Multi30K: Multilingual English-German Image Descriptions
TLDR
We introduce a large-scale dataset of images paired with sentences in English and German as an initial step towards studying the value and the characteristics of multilingual multimodal data. Expand
  • 156
  • 28
  • PDF
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
TLDR
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image description in a target language, given an image and/or one or more descriptions in a different (source) language. Expand
  • 127
  • 15
  • PDF
Image Description using Visual Dependency Representations
TLDR
We introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. Expand
  • 208
  • 13
  • PDF
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
TLDR
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. Expand
  • 217
  • 13
  • PDF
How2: A Large-scale Dataset for Multimodal Language Understanding
TLDR
We introduce How2, a multimodal collection of instructional videos paired with spoken utterances, English subtitles and their crowdsourced Portuguese translations, as well as English video summaries. Expand
  • 71
  • 13
  • PDF
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
TLDR
We present the results from the second shared task on multimodal machine translation and multilingual image description. Expand
  • 89
  • 12
  • PDF
Multilingual Image Description with Neural Sequence Models
TLDR
In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. Expand
  • 53
  • 8
  • PDF
Imagination Improves Multimodal Translation
TLDR
We present a multitask learning model that decomposes multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. Expand
  • 69
  • 7
  • PDF
Multi-Language Image Description with Neural Sequence Models
TLDR
In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. Expand
  • 50
  • 6
  • PDF
Comparing Automatic Evaluation Measures for Image Description
TLDR
We estimate the correlation of unigram and Smoothed BLEU, TER, ROUGE-SU4, and Meteor against human judgements on two image description data sets. Expand
  • 100
  • 4
  • PDF