• Corpus ID: 237213404

Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021

@article{Puranik2021AttentiveFO,
  title={Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021},
  author={Karthik Puranik and Adeep Hande and Ruba Priyadharshini and Thenmozi Durairaj and Anbukkarasi Sampath and Kingston Pal Thamburaj and Bharathi Raja Chakravarthi},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.08556}
}
This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English→Marathi and English⇔Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English→Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English⇔Irish model for… 

Figures and Tables from this paper

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages
We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was
gaHealth: An English–Irish Bilingual Corpus of Health Data
TLDR
This study outlines the process used in developing the corpus and empirically demonstrates the benefits of using an in-domain dataset for the health domain, and defines linguistic guidelines for developing gaHealth, the first bilingual corpus of health data for the Irish language.
Pegasus@Dravidian-CodeMix-HASOC2021: Analyzing Social Media Content for Detection of Offensive Text
TLDR
This research paper employs two Transformer-based prototypes which successfully stood in the top 8 for all the tasks of the HASOC - DravidianCodeMix FIRE 2021 shared task and introduces two inventive methods for detecting offensive comments/posts in Tamil and Malayalam.
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages
TLDR
The work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 is described by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI), English translations (TRAA) of the TRAI data and the combination of all the three.

References

SHOWING 1-10 OF 62 REFERENCES
Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation
TLDR
This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking and helps improve the translation quality by more than 3.7 BLEU points.
The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT
TLDR
A new benchmark for machine translation that provides training and test data for thousands of language pairs covering over 500 languages and tools for creating state-of-the-art translation models from that collection is described to trigger the development of open translation tools and models with a much broader coverage of the World’s languages.
Multilingual NMT with a Language-Independent Attention Bridge
TLDR
A new framework for the efficient development of multilingual neural machine translation (NMT) using this model and scheduled training and achieves substantial improvements over strong bilingual models and performs well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning.
Deep Learning Approach to English-Tamil and Hindi-Tamil Verb Phrase Translations
TLDR
A deep learning methodology for English-Tamil and Hindi-Tamils VP translations is presented and neural machine translation model is adopted to implement the methodology for VP translation.
Unsupervised Approach for Zero-Shot Experiments: Bhojpuri–Hindi and Magahi–Hindi@LoResMT 2020
TLDR
An unsupervised domain adaptation approach that gives promising results for zero or extremely low resource languages and a hybrid approach of domain adaptation and back-translation is used.
Semi-Supervised Learning for Neural Machine Translation
TLDR
This work proposes a semi-supervised approach for training NMT models on the concatenation of labeled and unlabeled monolingual corpora data, in which the source- to-target and target-to-source translation models serve as the encoder and decoder, respectively.
Beyond English-Centric Multilingual Machine Translation
TLDR
This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.
Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder
TLDR
This paper presents the first attempts in building a multilingual Neural Machine Translation framework under a unified approach in which the information shared among languages can be helpful in the translation of individual language pairs and points out a novel way to make use of monolingual data with Neural Machine translation.
Revisiting Low Resource Status of Indian Languages in Machine Translation
TLDR
This paper provides and analyse an automated framework to obtain a corpus for Indian language neural machine translation (NMT) systems and evaluates the design choices such as the choice of pivoting language and the effect of iterative incremental increase in corpus size.
WordNet Gloss Translation for Under-resourced Languages using Multilingual Neural Machine Translation
This work was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) FPI grant number BES-2017-081045, and projects BigKnowledge (BBVA foundation grant 2018), DOMINO
...
...