No Language Left Behind: Scaling Human-Centered Machine Translation

  title={No Language Left Behind: Scaling Human-Centered Machine Translation},
  author={Nllb team and Marta Ruiz Costa-juss{\`a} and James Cross and Onur cCelebi and Maha Elbayad and Kenneth Heafield and Kevin Heffernan and Elahe Kalbassi and Janice Lam and Daniel Licht and Jean Maillard and Anna Sun and Skyler Wang and Guillaume Wenzek and Alison Youngblood and Bapi Akula and Lo{\"i}c Barrault and Gabriel Mejia Gonzalez and Prangthip Hansanti and John Hoffman and Semarley Jarrett and Kaushik Ram Sadagopan and Dirk Rowe and Shannon L. Spruit and C. Tran and Pierre Andrews and Necip Fazil Ayan and Shruti Bhosale and Sergey Edunov and Angela Fan and Cynthia Gao and Vedanuj Goswami and Francisco Guzm'an and Philipp Koehn and Alexandre Mourachko and Christophe Ropers and Safiyyah Saleem and Holger Schwenk and Jeff Wang},
Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this… 

Silo NLP's Participation at WAT2022

This paper provides the system description of “Silo NLP’s" submission to the Workshop on Asian Translation (WAT2022) and tops many tasks including English → Hindi multimodal translation (evaluation test), English → Malayalam text-only and multimodals translation ( evaluation test), and English → Bengali multimodAL translation (challenge test).

NLP for Language Varieties of Italy: Challenges and the Path Forward

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expression, and history of its speakers.

The first neural machine translation system for the Erzya language

We present the first neural machine translation system for translation between the endangered Erzya language and Russian and the dataset collected by us to train and evaluate it. The BLEU scores are

Examining Large Pre-Trained Language Models for Machine Translation: What You Don't Know About It

This work examines if xLPLMs are absolutely superior to smaller-sized PLMs in fine-tuning toward domain-specific MTs, and chooses popular Marian Helsinki as smaller sized PLM and two massive-sized Mega-Transformers from Meta-AI as xL PLMs.

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages

The IndicSUPERB benchmark is released, which shows that language-specific fine-tuned models are more accurate than baseline on most of the tasks, including a large gap of 76% for the Language Identiflcation task.