Corpas na Gaeilge 1882-1926: Integrating Historical and Modern Irish Texts

  author={Elaine U{\'i} Dhonnchadha and Kevin P. Scannell and Ruair{\'i} {\'O} Huiginn and Eil{\'i}s N{\'i} Mhearra{\'i} and M{\'a}ire Nic Mh{\'a}olain and Brian {\'O} Raghallaigh and Gregory Toner and S{\'e}amus Mac Math{\'u}na and D{\'e}irdre D'Auria and Eithne N{\'i} Ghallchobh{\'a}ir and Niall O’Leary},
This paper describes the processing of a corpus of seven million words of Irish texts from the period 1882-1926. The texts which have been captured by typing or optical character recognition are processed for the purpose of lexicography. Firstly, all historical and dialectal word forms are annotated with their modern standard equivalents using software developed for this purpose. Then, using the modern standard annotations, the texts are processed using an existing finite-state morphological… 

