An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

@inproceedings{Delbrouck2017AnES,
  title={An empirical study on the effectiveness of images in Multimodal Neural Machine Translation},
  author={Jean-Benoit Delbrouck and St{\'e}phane Dupont},
  booktitle={EMNLP},
  year={2017}
}
In state-of-the-art Neural Machine Translation (NMT), an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multi-modal tasks, where it becomes possible to focus both on sentence parts and image regions that they describe. In… Expand
Probing the Role of Video Context in Multimodal Machine Translation with Source Corruption
Multimodal machine translation (MMT) is the task of utilising information from non-textual modalities to aid textual machine translation (MT). Thanks to the recent rapid development of deep neuralExpand
Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions
TLDR
This work proposes the application of semantic image regions for MNMT by integrating visual and textual features using two individual attention mechanisms (double attention) and demonstrates concrete improvements on translation performance benefited from semantic image areas. Expand
Feature-level Incongruence Reduction for Multimodal Translation
TLDR
This work proposes to extend MNMT architecture with a harmonization network, which harmonizes multimodal features(linguistic and visual features) by unidirectional modal space conversion by leading to the competitive performance to the-state-of-the-art. Expand
Visual Agreement Regularized Training for Multi-Modal Machine Translation
TLDR
Visual agreement regularized training is presented to make better use of visual information in multi-modal machine translation and shows that the approaches can outperform competitive baselines by a large margin on the Multi30k dataset. Expand
Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation
TLDR
It is hypothesize that richer architecture, such as dense captioning models, may be more suitable for MNMT and could lead to improved translations and is extended to the word-embeddings, where it compute both linguistic and visual representation for the corpus vocabulary. Expand
Multi-modal neural machine translation with deep semantic interactions
TLDR
This model extends the conventional multi-modal NMT by introducing the following two attention neural networks: a bi-directional attention network for modeling text and image representations, where the semantic representations of text are learned by referring to the image representation, and vice versa. Expand
Distilling Translations with Visual Awareness
TLDR
This work proposes a translate-and-refine approach to this problem where images are only used by a second stage decoder and shows that it has the ability to recover from erroneous or missing words in the source language. Expand
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
TLDR
A novel graph-based multi-modal fusion encoder for NMT that captures various semantic relationships between multi- modal semantic units (words and visual objects) and provides an attention-based context vector for the decoder. Expand
Modulating and attending the source image during encoding improves Multimodal Translation
We propose a new and fully end-to-end approach for multimodal translation where the source text encoder modulates the entire visual input processing using conditional batch normalization, in order toExpand
Multimodal Sentence Summarization via Multimodal Selective Encoding
TLDR
A multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features to select highlights of the event when encoding the source sentence is proposed. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 28 REFERENCES
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
TLDR
The more advanced Multimodal Compact Bilinear pooling method, which takes the outer product of two vectors to combine the attention features for the two modalities, is evaluated for multimodal image caption translation and shows improvements compared to basic combination methods. Expand
Effective Approaches to Attention-based Neural Machine Translation
TLDR
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. Expand
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neuralExpand
Does Multimodality Help Human and Machine for Translation and Image Captioning?
TLDR
The systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge are presented, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a targetExpand
Attention-based Multimodal Neural Machine Translation
TLDR
A novel neural machine translation architecture associating visual and textual features for translation tasks with multiple modalities that outperform the text-only baseline. Expand
Neural Machine Translation of Rare Words with Subword Units
TLDR
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU. Expand
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION
We define a new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments.Expand
...
1
2
3
...