Word statistics of Turkish language on a large scale text corpus - TurCo

@article{Dalkili2004WordSO,
  title={Word statistics of Turkish language on a large scale text corpus - TurCo},
  author={G{\"o}khan Dalkiliç and Yalçin Çebi},
  journal={International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004.},
  year={2004},
  volume={2},
  pages={319-323 Vol.2}
}
Determination of the statistical properties of a natural language is one of the most important part of the language analysis. Number of different words (NODW), and different word usage ratio (DWUR) concepts are some of the general characteristics of a corpus. These values are described and calculated for the Turkish corpus (TurCo). Also, word n-grams are calculated for Turkish which was done for English years ago but couldn't be done for Turkish because of the lack of a large scale corpus… CONTINUE READING