How many words do children know? A corpus-based estimation of children’s total vocabulary size

  title={How many words do children know? A corpus-based estimation of children’s total vocabulary size},
  author={Jutta Segbers and Sascha Schroeder},
  journal={Language Testing},
  pages={297 - 320}
In this article we present a new method for estimating children’s total vocabulary size based on a language corpus in German. We drew a virtual sample of different lexicon sizes from a corpus and let the virtual sample “take” a vocabulary test by comparing whether the items were included in the virtual lexicons or not. This enabled us to identify the relation between test performance and total lexicon size. We then applied this relation to the test results of a real sample of children (grades 1… 

How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age

Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200

Investigating Developmental Trajectories of Morphemes as Reading Units in German

Results imply that readers of German are sensitive to morphology in very early stages of reading acquisition with trajectories depending on morphological type and vocabulary knowledge, and children with higher vocabulary knowledge benefit earlier in development and to a greater extent from morphology.

Compound Reading in German: Effects of Constituent Frequency and Whole-Word Frequency in Children and Adults

The results suggest that developing readers already decompose compounds and that hybrid interactive models of morphological processing are most suitable to explain compound recognition across development.

Does morphological structure modulate access to embedded word meaning in child readers?

Italian elementary school children asked to make category decisions on words revealed that words were harder to reject as members of a category when the embedded stem was category-congruent, suggesting orthographic stems are activated and activation is fed forward to the semantic level regardless of morphological structure.

Orthographic Networks in the Developing Mental Lexicon. Insights From Graph Theory and Implications for the Study of Language Processing

Results show that, similar to semantic and phonological networks, orthographic networks possess small-word characteristics defined by short average path lengths between nodes and strong local clustering.

Syllables and morphemes in German reading development: Evidence from second graders, fourth graders, and adults

ABSTRACT Children have been found to use units such as syllables and morphemes in fine-grained reading processes, before they transition to a coarse-grained, holistic route. Which units they prefer

Morphological Priming in Children: Disentangling the Effects of School-grade and Reading Skill

ABSTRACT Masked priming studies have shown that readers decompose morphologically complex words (read+er). Interindividual differences have been suggested to affect this phenomenon. However, its

Development of unfamiliar accent comprehension continues through adolescence

  • Tessa Bent
  • Linguistics
    Journal of Child Language
  • 2018
School-age children's understanding of unfamiliar accents is not adult-like and the age at which this ability fully matures is unknown, so adult- like comprehension may require greater exposure to linguistic variability or additional cognitive–linguistic growth.

Zipf's law revisited: Spoken dialog, linguistic units, parameters, and the principle of least effort.

The ubiquitous inverse relationship between word frequency and word rank is commonly known as Zipf's law. The theoretical underpinning of this law states that the inverse relationship yields

Developmental Trajectories in the Understanding of Everyday Uncertainty Terms

Dealing with uncertainty and different degrees of frequency and probability is critical in many everyday activities. However, relevant information does not always come in the form of numerical



Vocabulary simplification for children: a special case of ‘motherese’?

A new corpus of spontaneous conversations between adults and children is examined for evidence that adults simplify their vocabulary choices when speaking with young children, but adults do not choose their words from the 10,000 most common word-types in English in an age-dependent manner.

Does frequency count? Parental input and the acquisition of vocabulary

ABSTRACT Studies examining factors that influence when words are learned typically investigate one lexical category or a small set of words. We provide the first evaluation of the relation between

Early vocabulary growth: Relation to language input and gender.

This study examines the role of exposure to speech in children's early vocabulary growth. It is generally assumed that individual differences in vocabulary depend, in large part, on variations in

Estimating the Size of Vocabularies of Children and Adults

ONE OF THE oldest problems in educational research has been the estimation of vocabulary size of children and adults. Estimates for the same age groups vary considerably; some of the more re cent

Early lexical development in German: a study on vocabulary growth and vocabulary composition during the second and third year of life

The study describes the development of various categories of words and questions the preponderance of nouns in spontaneous speech, and trend analyses clarify characteristic developmental patterns in regard to certain word categories.

Why are some verbs learned before other verbs? Effects of input frequency and structure on children's early verb use

The effect of syntactic diversity in input provides support for the syntactic bootstrapping account of how children use structural information to learn the meaning of new verbs, suggesting that the way verbs appear in input influences their ease of acquisition.

childLex: a lexical database of German read by children

This article introduces childLex, an online database of German read by children. childLex is based on a corpus of children’s books and comprises 10 million words that were syntactically annotated and


In attempting to answer the apparently simple question, "How many words do you know," numerous investigations have discovered that individual vocabularies, from childhood to maturity, are much larger

Toward a Meaningful Definition of Vocabulary Size

Studies using dictionary-sampling methods to estimate vocabulary size have left a bewildering trail of widely differing estimates. We argue that many estimates are misleading (generally too high)