Contributions to the science of text and language : word length studies and related issues

  author={Peter Grzybek},
Preface Peter Grzybek / On The Science of Language In Light of The Language of Science Peter Grzybek / History and Methodology of Word Length Studies Simone Andersen and Gabriel Altmann / Information Content of Words in Texts Gordana Antic, Emmerich Kelih and Peter Grzybek / Zero-syllable Words in Determining Word Length August Fenk and Gertraud Fenk-Oczlon / Within-Sentence Distribution and Retention of Content Words and Function Words Primoz Jakopin / On Text Corpora, Word Lengths, and Word… 

Word Classes and Word Order

A statistical reanalysis using the Wilcoxon test found significantly higher relative recall scores in content words than in function words in all three parts of the sentences, and an increased generality of a covering law that states initial positions of a string being per se of higher informational content should be occupied by items of high frequency and low informational content.

Peter Grzybek (Hg.). 2007. Contributions to the Science of Text and Language. Word Length Studies and Related Issues (Text, Speech and Technology Series 31). Berlin, Heidelberg: Springer. xii, 352 S

The majority of the 17 contributions to this volume have their origin in a conference held near Graz in 2002 at the beginning of a project funded by the FWF, whose purpose was to study word length and word length frequencies.

Analysis for the significance of statistical word-length features in genre discrimination of Hindi texts

An attempt has been made to test the contribution of quantitative word length features in classification of written texts of Hindi Language by extracting quantitative measures with the help of word length profiles and frequencies.

How Does Word Length Evolve in Written Chinese?

It is concluded that the disyllabic trend may account for the increase of word length, and its impacts can be explained in "the principle of least effort".

Local grammars in word counting

This paper investigates to what extent it is possible to transform a text into a precisely annotated linguistical object by the use of formal methods only and shows how FSTs can be used to obtain the linguistically annotated text that is both rich with linguistic information and precise.

Word Length and Frequency Distributions in Different Text Genres

It is obtained that the Singh-Poisson distribution seems to be the best choice for both problems: first, it is an appropriate model for three of the text sorts (private letters, journalistic texts and poems); and second, the parameter space of the model can be split into regions constituting all four text sorts.

Word Length Distribution in Zhuang Language

The study of the word length-frequency relationship of Zhuang indicates that Zhuang word length is influenced by its frequency, which can be explained by Zipf’s ‘Principle of Least Effort’ and thus follow the law of lexical synersgetic subsystem in synergetic linguistics.

How to Measure Word Length in Spoken and Written Chinese

Empirical word length distribution models, synergetic linguistic theories and Menzerath’s law are used in this study, and results show that the syllable is the most appropriate measurement unit for spoken Chinese, and the component the mostappropriate measurementunit for written Chinese.



Quantitative Linguistics and Complex System Studies

Linguistic discourses treated as maximum entropy systems of words according to prescriptions of algorithmic information theory are shown to give a natural explanation of Zipf's law with quantitative rigor and are likely to be valid for diverse complex systems of nature.

Towards a Theory of Word Length Distribution

The compound Poisson and Ord family of distributions seems to be adequate for modeling word length distributions and the relationship of word length to other language phenomena is discussed.

How nature works: The science of self-organized criticality

His ruthless simplifications of geology, evolution, and neurology pay off because his models describe behavior that is common across these domains, and this universality means that trampling across others turf is not only acceptable, but almost mandatory, if the underlying principles are to be exposed.