Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene

@inproceedings{Erjavec2011AutomaticLA,
  title={Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene},
  author={Tomaz Erjavec},
  booktitle={LaTeCH@ACL},
  year={2011}
}
The paper describes a tool developed to process historical (Slovene) text, which annotates words in a TEI encoded corpus with their modern-day equivalents, morphosyntactic tags and lemmas. Such a tool is useful for developing historical corpora of highly-inflecting languages, enabling full text search in digital libraries of historical texts, for modernising such texts for today's readers and making it simpler to correct OCR transcriptions. 
Highly Cited
This paper has 20 citations. REVIEW CITATIONS

From This Paper

Topics from this paper.

References

Publications referenced by this paper.
Showing 1-10 of 16 references

Annotating a historical corpus of German: A case study

  • Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt
  • Proceedings of the LREC 2010 workshop on Language…
  • 2010

Semi-automatic Normalization of Old Hungarian Codices

  • Csaba Oravecz, Bálint Sass, Eszter Simon.
  • Proceedings of the ECAI 2010 Workshop on Language…
  • 2010
1 Excerpt

Similar Papers

Loading similar papers…