Corpus ID: 214623492

Improving Yor\`ub\'a Diacritic Restoration

  title={Improving Yor\`ub\'a Diacritic Restoration},
  author={Iroro Orife and D. Adelani and Timi E. Fasubaa and Victor Williamson and Wuraola Fisayo Oyewusi and Olamilekan Wahab and Kola Tubosun},
  journal={arXiv: Computation and Language},
  • Iroro Orife, D. Adelani, +4 authors Kola Tubosun
  • Published 2020
  • Computer Science
  • arXiv: Computation and Language
  • Yoruba is a widely spoken West African language with a writing system rich in orthographic and tonal diacritics. They provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any computational Speech or Natural Language Processing tasks. However diacritic marks are commonly excluded from electronic texts due to limited device and application support as well as general education on proper usage. We report on recent efforts at dataset cultivation… CONTINUE READING
    Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
    • 2
    • PDF


    Publications referenced by this paper.
    Automatic Diacritic Restoration for Resource-Scarce Languages
    • 42
    • PDF
    Enriching Word Vectors with Subword Information
    • 3,769
    • PDF
    Orthographic diacritics and multilingual computing
    • 39
    Baseline:ìríríàwo . nìlú esinsin arákùnrin jé
      Iroro FredÒ . nò . mè . Orife. Sequence-to-Sequence Learning for Automatic Yorùbá Diacritic Restoration
      • 2018
      Prediction:ìrírí akobuloogu orílèìlú ethiopia jé