• Corpus ID: 8107519

First Result on Arabic Neural Machine Translation

@article{Almahairi2016FirstRO,
  title={First Result on Arabic Neural Machine Translation},
  author={Amjad Almahairi and Kyunghyun Cho and Nizar Habash and Aaron C. Courville},
  journal={ArXiv},
  year={2016},
  volume={abs/1606.02680}
}
Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation. We notice however that much of research on neural machine translation has focused on European languages despite its language agnostic nature. In this paper, we apply neural machine translation to the task of Arabic translation (Ar En) and compare it against a standard phrase-based translation system. We run extensive comparison using various configurations in preprocessing… 

Tables from this paper

A Recipe for Arabic-English Neural Machine Translation

TLDR
It is found that tuning a model trained on the whole data using a small high quality corpus like Ummah gives a substantial improvement and training a neural system with a small Arabic-English corpus is competitive to a traditional phrase-based system.

The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

TLDR
This paper systematically compares neural and statistical MT models for Arabic-English translation on data preprecossed by various prominent tokenization schemes and shows that the best choice of tokenization scheme is largely based on the type of model and the size of data.

Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

TLDR
A small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile.

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

TLDR
This work compares standard phrase-based and neural systems on Arabic-Hebrew translation, and experiments with tokenization by external tools and sub-word modeling by character-level neural models show that both methods lead to improved translation performance, with a small advantage to the neural models.

Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation

TLDR
Extensive experiments on Arabic-Chinese translation demonstrate that the proposed approaches can effectively tackle the UNK problem and significantly improve the translation quality for Arabic-sourced translation.

Improved Arabic-Chinese Machine Translation with Linguistic Input Features

TLDR
Results of a preliminary evaluation show that the use of linguistic features on the Arabic side considerably outperforms baseline and tokenized approaches, and the system can consistently reduce the OOV rate as well.

TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation

TLDR
TURJUMAN exploits the recently-introduced text-to-text Transformer AraT5 model, endowing it with a powerful ability to decode into Arabic, making it suited for acquiring paraphrases for the MSA translations as an added value.

Transliteration of Algerian Arabic dialect into Modern Standard Arabic

TLDR
A method of applying a neural transliteration based on a character-level for transliterating the Arabizi into Arabic script and it is found that NMTR outperforms SMTR by 2.18%.

A technical reading in statistical and neural machines translation (SMT & NMT)

TLDR
A survey of the state of the art of statistical machine translation and neural machine translation is presented, where the context of the current research studies is described, and the main strengths and limitations of the two approaches are reviewed.

Comparison between Neural and Sta- tistical translation after translitera- tion of Algerian Arabic Dialect

TLDR
An Arabic dialect translation system composed by two modules: Transliteration and translation, developed each module with a statistical and a neural model, which shows that a good transliteration improves the translation results.

References

SHOWING 1-10 OF 27 REFERENCES

Orthographic and morphological processing for English–Arabic statistical machine translation

TLDR
The results show that the best performing tokenization scheme is that of the Penn Arabic Treebank, and training on orthographically normalized text then jointly enriching and detokenizing the output outperforms training on enriched text.

Segmentation for English-to-Arabic Statistical Machine Translation

TLDR
It is shown that morphological decomposition of the Arabic source is beneficial, especially for smaller-size corpora, and recombination techniques are investigated, and the use of Factored Translation Models for English-to-Arabic translation is reported on.

Arabic Preprocessing Schemes for Statistical Machine Translation

TLDR
The results show that given large amounts of training data, splitting off only proclitics performs best, and choosing the appropriate preprocessing produces a significant increase in BLEU score if there is a change in genre between training and test data.

On Using Very Large Target Vocabulary for Neural Machine Translation

TLDR
It is shown that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary.

Neural Machine Translation by Jointly Learning to Align and Translate

TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Statistical Machine Translation Features with Multitask Tensor Networks

TLDR
A three-pronged approach to improving Statistical Machine Translation (SMT), building on recent success in the application of neural networks to SMT, that augment the architecture of the neural network with tensor layers that capture important higher-order interaction among the network units.

Statistical Phrase-Based Translation

TLDR
The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.

A Character-level Decoder without Explicit Segmentation for Neural Machine Translation

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation. In this paper, we ask a fundamental

Recurrent Continuous Translation Models

We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences

The Penn Arabic Treebank : Building a Large-Scale Annotated Arabic Corpus

TLDR
This paper will address pertinent Arabic language issues as they relate to methodology choices, explain the choice to use the Penn English Treebank style of guidelines, and show several ways in which human annotation is important and automatic analysis difficult.