• Publications
  • Influence
Multi30K: Multilingual English-German Image Descriptions
TLDR
This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. Expand
How2: A Large-scale Dataset for Multimodal Language Understanding
TLDR
How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations, is introduced, and integrated sequence-to-sequence baselines for machine translation, automatic speech recognition, spoken language translation, and multi-modal summarization are presented. Expand
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a targetExpand
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
TLDR
The results from the second shared task on multimodal machine translation and multilingual image description show multi-modal systems improved, but text-only systems remain competitive. Expand
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
TLDR
This survey classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. Expand
Image Description using Visual Dependency Representations
TLDR
In an image description task, two template-based description generation models that operate over visual dependency representations outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements. Expand
Imagination Improves Multimodal Translation
TLDR
This work decomposes multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations, and finds improvements if the translation model is trained on the external News Commentary parallel text dataset. Expand
Multilingual Image Description with Neural Sequence Models
TLDR
An approach to multi-language image description bringing together insights from neural machine translation and neural image description is presented, finding significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline. Expand
Multi-Language Image Description with Neural Sequence Models
TLDR
An approach to multi-language image description bringing together insights from neural machine translation and neural image description is presented, finding significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline. Expand
Findings of the Third Shared Task on Multimodal Machine Translation
TLDR
Compared to last year, the performance of the multimodal submissions improved, but text-only systems remain competitive. Expand
...
1
2
3
4
5
...