Corpus ID: 48356442

Challenges of language technologies for the indigenous languages of the Americas

@inproceedings{Mager2018ChallengesOL,
  title={Challenges of language technologies for the indigenous languages of the Americas},
  author={Manuel Mager and Ximena Gutierrez-Vasques and Gerardo E Sierra and Ivan Vladimir Meza Ruiz},
  booktitle={COLING},
  year={2018}
}
Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant languages and low-resource scenarios are faced. We would like to encourage NLP research in linguistically rich and diverse areas like the Americas. 
Addressing Challenges of Indigenous Languages through Neural Machine Translation: The case of Inuktitut-English
There is a growing amount of research interests towards Indigenous languages, realities and challenges within the NLP international community. Up do date, these Indigenous languages have been veryExpand
Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas
TLDR
The shared task featured two independent tracks, and participants submitted machine translation systems for up to 10 indigenous languages, and for the majority of languages, many teams were able to considerably improve over the baseline. Expand
A Critical Review of the Current State of Natural Language Processing in Mexico and Chile
This chapter presents a critical review of the current state of natural language processing in Chile and Mexico. Specifically, a general review is made regarding the technological evolution of theseExpand
IndT5: A Text-to-Text Transformer for 10 Indigenous Languages
TLDR
IndT5, the first Transformer language model for Indigenous languages, is introduced and the application of IndT5 to machine translation is presented by investigating different approaches to translate between Spanish and the Indigenous languages as part of the AmericasNLP 2021 Shared Task on Open Machine Translation. Expand
CPLM, a Parallel Corpus for Mexican Languages: Development and Interface
TLDR
The process of building the CPLM: text searching, digitalization and alignment process, some difficulties regarding dialectal and orthographic variations, and the interface and types of searching as well as the use of filters are described. Expand
Development of the Parallel Corpus of Mexican Languages (CPLM)
Mexico has a great language diversity. In addition to Spanish, there are 68 language groups and 364 variants (INALI, 2008), divided into 11 families. However, this wealth has been threatened due toExpand
A Mixtec-Spanish Parallel Corpus
Computational technologies have a key role in Computational Linguistics. Thanks to the capability of compiling and analyzing large collections of texts with computers many resources and applicationsExpand
Development of Natural Language Processing Tools for Cook Islands Māori
TLDR
Three ongoing projects for NLP in Cook Islands Maori are presented, including Untrained Forced Alignment, speech-to-text and POS tagging, which include new resources filling in a gap in Australasian languages. Expand
Open Machine Translation for Low Resource South American Languages (AmericasNLP 2021 Shared Task Contribution)
TLDR
The team (“Tamalli”)’s submission to AmericasNLP2021 shared task on Open Machine Translation for low resource South American languages is described, with the second-best results for the language pairs “Spanish-Bribri”, ‘Spanish-Asháninka’, and ”Spanish-Rarámuri”. Expand
Revitalization of Indigenous Languages through Pre-processing and Neural Machine Translation: The case of Inuktitut
Indigenous languages have been very challenging when dealing with NLP tasks and applications because of multiple reasons. These languages, in linguistic typology, are polysynthetic and highlyExpand
...
1
2
3
4
...

References

SHOWING 1-10 OF 106 REFERENCES
Survey on the Use of Typological Information in Natural Language Processing
TLDR
This paper provides a systematic survey of existing typological resources and their use in NLP as well as discussion which it is hoped will both inform and inspire future work in the area. Expand
On Achieving and Evaluating Language-Independence in NLP
TLDR
It is argued that, on the one hand, the authors are not truly evaluating language independence with any systematicity and on the other hand, that truly language-independent technology requires more linguistic sophistication than is the norm. Expand
A Low-Resourced Peruvian Language Identification Model
TLDR
This work focuses on the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories and the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. Expand
Building NLP Systems for Two Resource-Scarce Indigenous Languages : Mapudungun and Quechua
By adopting a “first-things-first” approach we overcome a number of challenges inherent in developing NLP Systems for resourcescarce languages. By first gathering the necessary corpora and lexiconsExpand
Ship-LemmaTagger: Building an NLP Toolkit for a Peruvian Native Language
TLDR
This paper describes the implementation of a basic NLP toolkit for a new language, focusing in the features mentioned before, and testing them in an own corpus built for the occasion, and the obtained results exceeded the expected results and could be used for more complex tasks such as machine translation. Expand
Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
TLDR
Methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language are presented. Expand
A Morphological Parser for Odawa
TLDR
The utility and design of a finite state parser, a widespread technology, for the Odawa dialect of Ojibwe (Algonquian, United States and Canada) is illustrated. Expand
Rule-based machine translation for Aymara
This paper presents the ongoing result of an approach developed by the collaboration of a computational linguist with a field linguist that addresses one of the oft-overlooked keys to languageExpand
Design and implementation of controlled elicitation for machine translation of low-density languages
TLDR
This work is building a tool that will elicit a controlled corpus from a bilingual speaker who is not an expert in linguistics, intended to cover major typological phenomena, as it is designed to work for any language. Expand
Parsing a Polysynthetic Language
TLDR
The paper presents formal description of selected properties of Aymara which are uncommon in well-researched Western languages and presents an experimental machine translation system into Spanish and English. Expand
...
1
2
3
4
5
...