Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English

@article{Brysbaert2009MovingBK,
  title={Moving beyond Ku{\vc}era and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English},
  author={Marc Brysbaert and Boris New},
  journal={Behavior Research Methods},
  year={2009},
  volume={41},
  pages={977-990}
}
Word frequency is the most important variable in research on word processing and memory. Yet, the main criterion for selecting word frequency norms has been the availability of the measure, rather than its quality. As a result, much research is still based on the old Kučera and Francis frequency norms. By using the lexical decision times of recently published megastudies, we show how bad this measure is and what must be done to improve it. In particular, we investigated the size of the corpus… Expand
The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German.
TLDR
It is found that the commonly used Celex frequencies are the least powerful to predict lexical decision times in the German language. Expand
Do the effects of subjective frequency and age of acquisition survive better word frequency norms?
TLDR
Analysis of reading aloud and lexical decision reaction times and accuracy rates for 2,336 words suggests that models of word processing need to utilize recently developed frequency estimates during training or setting baseline activation levels in the lexicon. Expand
SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
TLDR
This database of word and character frequencies based on a corpus of film and television subtitles is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. Expand
Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
TLDR
It is found that, despite the massive corpus on which the Google estimates are based, the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Expand
SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles
TLDR
A new database of Dutch word frequencies based on film and television subtitles is presented, and an accessibility measure based on contextual diversity explains more of the variance in accuracy and RT than does the raw frequency of occurrence counts. Expand
Subtlex-pl: subtitle-based word frequency estimates for Polish
TLDR
The results suggest that the two corpora may have unequal potential for explaining human performance for words in different frequency ranges and that corpora based on written materials severely overestimate frequencies for formal words. Expand
Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice
TLDR
A critical review of the heuristics used to deal with zero word frequencies shows that four are suboptimal, one is good, and one may be acceptable, and the Laplace transformation gives the most useful estimates. Expand
Subtitle-Based Word Frequencies as the Best Estimate of Reading Behavior: The Case of Greek
TLDR
Examination of SUBTLEX-GR, a subtitled-based corpus consisting of more than 27 million Modern Greek words, showed that frequencies estimated from a subtitle corpus explained the obtained results significantly better than traditional frequencies derived from written corpora. Expand
Oral frequency norms for 67,979 Spanish words
TLDR
Validity analyses showed significant correlations of oral frequency with other frequency measures and suggest that oral frequency can predict some types of lexical processing with the same or higher levels of precision, when contrasted with text- or subtitle-based frequencies. Expand
Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment
TLDR
It is shown that corpus word frequency and prevalence are complementary measures of word occurrence covering a broad range of language experiences and are shown to be the strongest independent predictor of word processing times in the Dutch Lexicon Project, making it an important variable for psycholinguistic research. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 74 REFERENCES
The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis
TLDR
Word frequency estimates from the Brown corpus were compared with those from a 131-million-word corpus (the HAL corpus; conversational text gathered from Usenet) in a standard word naming task with 32 subjects; RT was predicted equally well by both corpora for high-frequency words, but the larger corpus provided better predictors for low- and medium- frequency words. Expand
Spoken word frequency counts based on 1.6 million words in American English
TLDR
The present article reports the construction of a 1.6-million-word spoken frequency database derived from the Michigan Corpus of Academic Spoken English (Simpson, Swales, & Briggs, 2002) and assess the predictive validity of these counts, and discusses some possible applications outside of word recognition studies. Expand
Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision Times
TLDR
It is argued that the results reflect the importance of likely need in memory processes, and that the continuity between reading and memory suggests using principles from memory research to inform theories of reading. Expand
Word Frequencies in Written and Spoken English: based on the British National Corpus
Resulting from inter-disciplinary research with Linguistics, this book addressed limitations of earlier word frequency dictionaries of English, that of sample size and breadth. It supercedes previousExpand
The use of film subtitles to estimate word frequencies
We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy wayExpand
WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French
TLDR
This work uses the CELEX and Lexique lexical databases for word selection and nonword generation in Dutch, English, German, and French to generate items for Dutch and German item generation and psycholinguistic experiments on bilingualism. Expand
Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project
TLDR
The effect of word length (number of letters in a word) on lexical decision was reexamined using the English Lexicon Project and an unexpected pattern of results taking the form of a U-shaped curve was revealed. Expand
Morphological Decomposition and the Reverse Base Frequency Effect
  • M. Taft
  • Psychology, Medicine
  • The Quarterly journal of experimental psychology. A, Human experimental psychology
  • 2004
TLDR
Two experiments are reported here that demonstrate how an obligatory decomposition account can handle the absence of base frequency effects, and it is shown that the later stage of recombining the stem and affix is harder for high base frequency words than for lower base frequencyWords when matched on surface frequency, and that this can counterbalance the advantage of easier access to the higher frequency stem. Expand
Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage.
  • D. Balota, J. Chumbley
  • Psychology, Medicine
  • Journal of experimental psychology. Human perception and performance
  • 1984
TLDR
It is argued that decision processes having little to do with lexical access accentuate the word-frequency effect in the lexical decision task and that results from this task have questionable value in testing the assumption that word frequency orders the lexicon, thereby affecting time to access the mental lexicon. Expand
Using Internet search engines to estimate word frequency
TLDR
The results showed that Internet search engines produced frequency estimates that were highly consistent with those reported by Kucera and Francis and those calculated from CELEX, highly consistent across search engines, and very reliable over a 6-month period of time. Expand
...
1
2
3
4
5
...