Collecting Language Resources for the Latvian e-Government Machine Translation Platform

@inproceedings{Rozis2016CollectingLR,
  title={Collecting Language Resources for the Latvian e-Government Machine Translation Platform},
  author={Roberts Rozis and Andrejs Vasiljevs and Raivis Skadins},
  booktitle={LREC},
  year={2016}
}
This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed… CONTINUE READING