Extension of Zipf's Law to Word and Character N-grams for English and Chinese

@article{Ha2003ExtensionOZ,
  title={Extension of Zipf's Law to Word and Character N-grams for English and Chinese},
  author={Le Quan Ha and Elvira I. Sicilia-Garcia and Ji Ming and Francis Jack Smith},
  journal={IJCLCLP},
  year={2003},
  volume={8}
}
It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or characters in one list and put in order of frequency, the frequency of tokens in the combined list follows… CONTINUE READING
Highly Cited
This paper has 36 citations. REVIEW CITATIONS
20 Citations
40 References
Similar Papers

Citations

Publications citing this paper.

References

Publications referenced by this paper.
Showing 1-10 of 40 references

Applying an NVEF Word-Pair Identifier to the Chinese Syllable-toWord Conversion Problem

  • Tsai, J-L, Hsu, W-L
  • In Proceedings of the 19 International Conference…
  • 2002

The Use of the Maximum Likelihood Criterion in Language Modelling

  • H. Ney
  • In K. Ponting (*ed.): Computational Models of…
  • 1999
2 Excerpts

Similar Papers

Loading similar papers…