• Corpus ID: 626964

Images as Context in Statistical Machine Translation

@inproceedings{Calixto2012ImagesAC,
  title={Images as Context in Statistical Machine Translation},
  author={Iacer Calixto and Te{\'o}filo Em{\'i}dio de Campos and Lucia Specia},
  year={2012}
}
This paper reports ongoing experiments towards exploiting the use of images to provide additional context for statistical machine translation (SMT). We investigate whether this contextual information can be helpful in targeting two well-known challenges in machine translation: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslated). As a motivating example, consider Figure 1, which depicts a news headline extracted from the BBC… 

Figures from this paper

Using Images to Improve Machine-Translating E-Commerce Product Listings.

This paper studies how a multi-modal Neural Machine Translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine translation (SMT) model.

Multimodal Pivots for Image Caption Translation

This work presents an approach to improve statistical machine translation of image descriptions by multimodal pivots defined in visual space, and relies on available large datasets of monolingually captioned images, and on state-of-the-art convolutional neural networks to compute image similarities.

Machine Translation with Image Context from Mandarin Chinese to English

This work aims to produce a neural machine translation model that is capable of accepting both text and image context as a multimodal translator from Mandarin Chinese to English.

Translating short segments with NMT: a case study in English-to-Hindi

The results indicate that the Transformer model outperforms others in the large data setting in a number of automatic metrics and manual evaluation, and it also produces the fewest truncated sentences.

Incorporating visual information into neural machine translation

This work proposes different models to incorporate images into MT by transferring learning from pre-trained convolutional neural networks trained for classifying images, and puts forward one model to incorporate local visual features into NMT.

Multi30K: Multilingual English-German Image Descriptions

This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions.

Using Visual Feature Space as a Pivot Across Languages

This work shows that models trained to generate textual captions in more than one language conditioned on an input image can leverage their jointly trained feature space during inference to pivot across languages.

Incorporating Global Visual Features into Attention-based Neural Machine Translation.

This work introduces multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder, and reports new state-of-the-art results.

An error analysis for image-based multi-modal neural machine translation

An extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder finds that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi- modal models.

Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles

It is found that humans preferred translations obtained with a PBSMT system to both text-only and multi-modal NMT over 56% of the time, which suggests that images do help NMT in this use-case.