Anuar Sharafudinov

Learn More
This paper presents the Kazakh Language Corpus (KLC), which is one of the first attempts made within a local research community to assemble a Kazakh corpus. KLC is designed to be a large scale corpus containing over 135 million words and conveying five stylistic genres: literary, publicistic, official , scientific and informal. Along with its primary part(More)
We compare and discuss various approaches to the problem of part of speech (POS) tagging of texts written in Kazakh, an agglutinative and highly inflectional Turkic language. In Kazakh a single root may produce hundreds of word forms, and it is difficult, if at all possible, to label enough training data to account for a vast set of all possible word forms(More)
  • 1