Automatic Diacritic Restoration for Resource-Scarce Languages

@inproceedings{Pauw2007AutomaticDR,
  title={Automatic Diacritic Restoration for Resource-Scarce Languages},
  author={Guy De Pauw and Peter Waiganjo Wagacha and Gilles-Maurice de Schryver},
  booktitle={TSD},
  year={2007}
}
  • Guy De Pauw, Peter Waiganjo Wagacha, Gilles-Maurice de Schryver
  • Published in TSD 2007
  • Computer Science
  • The orthography of many resource-scarce languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are often represented in digital language resources as their unmarked equivalents. This renders corpus compilation more difficult, as these languages typically do not have the benefit of large electronic dictionaries to perform diacritic restoration. This paper describes experiments with a machine learning approach that is able to… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Figures, Tables, and Topics from this paper.

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 29 CITATIONS

    Restoring tone-Marks in Standard YorùBá Electronic Text: Improved Model

    VIEW 24 EXCERPTS
    CITES METHODS, BACKGROUND & RESULTS
    HIGHLY INFLUENCED

    Diacritics Restoration Using Deep Neural Networks

    • Andrej Hucko, Peter Lacko
    • Computer Science
    • 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA)
    • 2018
    VIEW 5 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    Improving Yor\`ub\'a Diacritic Restoration

    VIEW 1 EXCERPT
    CITES BACKGROUND

    Investigating Input and Output Units in Diacritic Restoration

    • Sawsan Alqahtani, Mona Diab
    • Computer Science
    • 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
    • 2019
    VIEW 2 EXCERPTS
    CITES BACKGROUND

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 16 REFERENCES

    Diacritics Restoration: Learning from Letters versus Learning from Words

    VIEW 7 EXCERPTS
    HIGHLY INFLUENTIAL

    Corpus-based statements of meaning versus descriptions of actual language use in dictionaries

    • G. M. de Schryver
    • Culture, Language and Identity (CLIDE) Seminar, University of the Western Cape, Bellville, South Africa
    • 2007
    VIEW 1 EXCERPT

    Creating a South African keyboard

    • D. Bailey
    • Afrilex 2006, the user perspective in lexicography, programme and abstracts, Pretoria, South Africa (SF)2 Press, pp. 17–18
    • 2006
    VIEW 1 EXCERPT

    Development of a corpus for Gı̃kũyũ using machine learning techniques

    • P. Wagacha, G. De Pauw, K. Getao
    • Proceedings of LREC workshop - Networking the development of language resources for African languages, Genoa, Italy, ELRA, pp. 27–30
    • 2006
    VIEW 2 EXCERPTS

    Amharic character recognition using a fast signature based algorithm

    • John Cowell, Fiaz Hussain
    • Computer Science
    • Proceedings on Seventh International Conference on Information Visualization, 2003. IV 2003.
    • 2003
    VIEW 1 EXCERPT