• Corpus ID: 174792744

Adapting NMT to caption translation inWikimedia Commons for low-resource languages

@article{Poncelas2019AdaptingNT,
  title={Adapting NMT to caption translation inWikimedia Commons for low-resource languages},
  author={Alberto Poncelas and Kepa Sarasola and Meghan Dowling and Andy Way and Gorka Labaka and I{\~n}aki Alegria},
  journal={Proces. del Leng. Natural},
  year={2019},
  volume={63},
  pages={33-40}
}
This paper presents a successful domain adaptation of a general neural machine translation (NMT) system using a bilingual corpus created with captions for images inWikimedia Commons for the Spanish-Basque and English-Irish pairs. 

Figures and Tables from this paper

Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation
TLDR
This work is using transductive data selection algorithms which take advantage of the information of the test set to retrieve sentences from a larger parallel set to achieve better performance than a generic model or a domain-adapted model.
The Impact of Indirect Machine Translation on Sentiment Classification
TLDR
This work proposes employing a machine translation system to translate customer feedback into another language to investigate in which cases translated sentences can have a positive or negative impact on an automatic sentiment classifier.
Machine Translation Summit XVII Proceedings of The 8th Workshop on Patent and Scientific Literature Translation
TLDR
This paper proposes a hybrid data-model parallel approach for sequence-to-sequence (Seq2Seq) recurrent neural network (RNN) machine translation and achieves a speed-up of 4.20 times when using 4 GPUs compared with the training speed when using 1 GPU without affecting machine translation accuracy as measured in terms of BLEU scores.
Improving transductive data selection algorithms for machine translation
TLDR
This thesis explores how Infrequent N-gram Recovery (INR) and FDA can also be beneficial to improving NMT models with just a fraction of the available data.

References

SHOWING 1-10 OF 25 REFERENCES
The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task
TLDR
Back-translated data is used to create new sentences and translate sentences that are close to the test set, so the model can be fine-tuned to the document to be translated.
SMT versus NMT: Preliminary comparisons for Irish
TLDR
A preliminary comparison of statistical machine translation and neural machine translation for English→Irish in the fixed domain of public administration shows that while an out-of-the-box NMT system may not fare quite as well as the authors' tailor-made domain-specific SMT system, the future may still be promising for EN→GA NMT.
OpenNMT: Open-Source Toolkit for Neural Machine Translation
TLDR
The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements.
Stanford Neural Machine Translation Systems for Spoken Language Domains
TLDR
This work further explores the effectiveness of NMT in spoken language domains by participating in the MT track of the IWSLT 2015 and demonstrates that using an existing NMT framework can achieve competitive results in the aforementioned scenarios when translating from English to German and Vietnamese.
Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
TLDR
This paper presents how an state-of-the-art SMT system is enriched by using an extra in-domain parallel corpora extracted from Wikipedia, and thinks this can be very useful for languages with limited amount of Parallel corpora, where in- domain data is crucial to improve the performance of MT sytems.
Tapadoir: developing a statistical machine translation engine and associated resources for Irish
TLDR
It is shown that the Tapadoir SMT system out-performs Google TranslateTM as a result of steps taken to tailor translation output to the user’s specific needs.
Feature decay algorithms for neural machine translation
TLDR
It is revealed that it is possible to find a subset of sentence pairs, that outperforms by 1.11 BLEU points the full training corpus, when used for training a German-English NMT system.
Wikipedia as Multilingual Source of Comparable Corpora
TLDR
An automatic method to build comparable corpora from Wikipedia using Categories as topic restrictions, which builds a corpus with texts in the two selected languages, whose content is focused on the selected topic.
Bleu: a Method for Automatic Evaluation of Machine Translation
TLDR
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Neural machine translation of Basque
TLDR
It is demonstrated that significant gains can be obtained with a neural network approach for this challenging language pair, and optimal configurations in terms of word segmentation and decoding parameters are described.
...
...