Type-token & Hapax-token Relation: A Combinatorial Model

@article{Milika2009TypetokenH,
  title={Type-token \& Hapax-token Relation: A Combinatorial Model},
  author={Jiř{\'i} Mili{\vc}ka},
  journal={Glottotheory},
  year={2009},
  volume={2},
  pages={110 - 99}
}
If we consider type-token relation to be a feature of text and not of language, we can approach a theoretically based and precise description of this relation. Such description will suit the demands of text linguistics better than the empirical laws that are used nowadays. This paper offers a model of the relation based on the combinatorial characterization of the distribution of types in a text. This method is subsequently used to formulate a model of hapax-token relation and the subject is… 

Rank-frequency Relation & Type-token Relation: Two Sides of the Same Coin

TLDR
There is no need for any approximation or assumptions and that the formulae can be derived purely algebraically from the rank-frequency relation or from any type of frequency distribution and that type-token relation can be computed from the hapax- token relation.

Types, Tokens, and Hapaxes: A New Heap’s Law

TLDR
Here the authors derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text, which naturally generalizes to equally accurate estimates for counting hapaxes and higher nn-legomena.

Measuring Lexical Richness through Type-Token Curve: a Corpus-Based Analysis of Arabic and English Texts

WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight different authors. These texts are grouped into two corpora: Arabic and English. The Arabic corpus

Measuring Lexical Richness through Type-Token Curve: a Corpus-Based Analysis of Arabic and English Texts

WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight different authors. These texts are grouped into two corpora: Arabic and English. The Arabic corpus

Transitions of Tonality: A Model-Based Corpus Study

TLDR
The study shows that chord progressions are largely asymmetrical and proceed mostly by fifths; however, third-based progressions become increasingly prevalent within the studied period.

References

SHOWING 1-5 OF 5 REFERENCES

Modeling web data

TLDR
extremely close agreement between observed vocabulary growth and Heaps' law and reasonable agreement with Ziff's law for medium to low frequency terms is found.

Empirical and Theoretical Bases of Zipf's Law

1Let us start by considering a basic form of Zipf's law. Suppose one has a natural-language corpus, e.g., a book written in English. Next, suppose one makes a frequency count of the words in the

Empirical and Theoretical Bases of Zipf’s Law. In: Library Trends

  • 1981

Modeling Web Data The type - token relation

    The type-token relation

    • Quantitative Linguistics. An International Handbook
    • 2005