The Tagged Icelandic Corpus ( MÍM )

@inproceedings{Helgadttir2012TheTI,
  title={The Tagged Icelandic Corpus ( M{\'I}M )},
  author={Sigr{\'u}n Helgad{\'o}ttir and {\'A}sta Svavarsd{\'o}ttir and Eir{\'i}kur R{\"o}gnvaldsson and Krist{\'i}n Bjarnad{\'o}ttir and Hrafn Loftsson},
  year={2012}
}
In this paper, we describe the development of a morphosyntactically tagged corpus of Icelandic, the MÍM corpus. The corpus consists of about 25 million tokens of contemporary Icelandic texts collected from varied sources during the years 2006–2010. The corpus is intended for use in Language Technology projects and for linguistic research. We describe briefly other Icelandic corpora and how they differ from the MÍM corpus. We describe the text selection and collection for MÍM, both for written… CONTINUE READING
6 Citations
8 References
Similar Papers

References

Publications referenced by this paper.
Showing 1-8 of 8 references

The Database of Modern Icelandic Inflection

  • K. Bjarnadóttir.
  • Proceedings of “Language Technology for…
  • 2012
Highly Influential
3 Excerpts

Morphosyntac - tic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change

  • A. P. J. van den Bosch, K. A. Zervanou
  • 2011

Lemmatisation of Multi-word Lexical Units: Motivation and Benefits

  • J. H. Jónsson.
  • H. Bergenholtz, S. Nielsen, and S. Tarp, editors…
  • 2010
1 Excerpt

The BNC handbook: exploring the British National Corpus with SARA

  • G. Aston, L. Burnard.
  • Edinburgh University Press, Edinburgh.
  • 1998
1 Excerpt

Similar Papers

Loading similar papers…