ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback

  title={ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback},
  author={Shiyue Zhang and Benjamin E. Frey and Mohit Bansal},
We introduce ChrEnTranslate, an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability, two user feedback interfaces for experts and common users respectively, example inputs to collect human translations for monolingual data, word alignment visualization, and relevant terms from the Cherokee English… 

Figures and Tables from this paper

Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

Several advancements on text translation are presented as a comparative analysis between two attention architectures, transformers and RNNs using several models that combine such architectures, two parallel corpuses, and two tokenization techniques.

HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese

This work investigates the usage of entity linking in downstream tasks and presents the first modularized EL toolkit for easy task adaptation, HOSMEL, for Chinese, with three flexible usage modes, a live demo, and a demonstration video.

How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language

More than 43% of the languages spoken in the world are endangered, and language loss currently occurs at an accelerated rate because of globalization and neocolonialism. Saving and revitalizing



ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization

ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English and several Cherokee- English and English-Cherokee machine translation systems are introduced.

Unsupervised Quality Estimation for Neural Machine Translation

An unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required, which achieves very good correlation with human judgments of quality, rivaling state-of-the-art supervised QE models.

The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English

This work introduces the FLORES evaluation datasets for Nepali–English and Sinhala– English, based on sentences translated from Wikipedia, and demonstrates that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low-resource MT.

BERGAMOT-LATTE Submissions for the WMT20 Quality Estimation Shared Task

The authors' black-box QE models tied for the winning submission in four out of seven language pairs in Task 1, thus demonstrating very strong performance, and the glass-box approaches also performed competitively, representing a light-weight alternative to the neural-based models.

Neural Machine Translation of Rare Words with Subword Units

This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

Scalable Modified Kneser-Ney Language Model Estimation

We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed

Cherokee Syllabary Texts: Digital Documentation and Linguistic Description

The Digital Archive of American Indian Languages Preservation and Perseverance (DAILP) is an innovative language revitalization project that seeks to provide digital infrastructure for the

Bleu: a Method for Automatic Evaluation of Machine Translation

This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

Minimum Error Rate Training in Statistical Machine Translation

It is shown that significantly better results can often be obtained if the final evaluation criterion is taken directly into account as part of the training procedure.

A Systematic Comparison of Various Statistical Alignment Models

An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.