A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions

@inproceedings{Schamoni2018ADA,
  title={A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions},
  author={Shigehiko Schamoni and Julian Hitschler and Stefan Riezler},
  booktitle={AMTA},
  year={2018}
}
We present a dataset and method for improving the translation of noisy image captions that were created by users of Wikimedia Commons. The dataset is multilingual but non-parallel, and is several orders of magnitude larger than existing parallel data for multimodal machine translation. Our retrieval-based method pivots on similar images and uses the… CONTINUE READING