Rethinking language: How probabilities shape the words we use

  title={Rethinking language: How probabilities shape the words we use},
  author={Thomas L. Griffiths},
  journal={Proceedings of the National Academy of Sciences},
  pages={3825 - 3826}
  • T. Griffiths
  • Published 23 February 2011
  • Computer Science
  • Proceedings of the National Academy of Sciences
If you think about the classes you expect to take when studying linguistics in graduate school, probability theory is unlikely to be on the list. However, recent work in linguistics and cognitive science has begun to show that probability theory, combined with the methods of computer science and statistics, is surprisingly effective in explaining aspects of how people produce and interpret sentences (1–3), how language might be learned (4–6), and how words change over time (7, 8). The paper by… 

The Intersection between Linguistic Theories and Computational Linguistics over time

ABSTRACT Recent achievements have turned Computational linguistics into a dynamic research area, and an important field of application development that is being explored by leading technology

Probabilistic language models in cognitive neuroscience: Promises and pitfalls

Emergent linguistic structure in artificial neural networks trained by self-supervision

Methods for identifying linguistic hierarchical structure emergent in artificial neural networks are developed and it is shown that components in these models focus on syntactic grammatical relationships and anaphoric coreference, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists.

Positive words carry less information than negative words

Taking into account the frequency of word usage, it is found that words with a positive emotional content are more frequently used, which lends support to Pollyanna hypothesis that there should be a positive bias in human expression.

The distribution of information content in English sentences

The results suggest that the hypotheses of Constant Entropy Rate and Uniform Information Density do not hold for the sentence-medial positions and the context of a word in a sentence should not be simply defined as all the words preceding it in the same sentence.

Information gain modulates brain activity evoked by reading

Information gain, an information theoretic measure that quantifies the specificity of a word given its topic context, modulates word-synchronised brain activity in the EEG and suggests that biological information processing seeks to maximise performance subject to constraints on information capacity.

The Role of Frequency in the Processing of giving and receiving Events in Korean

This study aimed to examine the processing benefits of frequency information associated with the case marker -eykey in comprehending Korean declarative sentences. By using a picture description task

A concept of semantics extraction from web data by induction of fuzzy ontologies

This paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification, intended as an extension to the prevailing top-down ontologies.

Induktive unscharfe Datenassoziierung — Fallstudie enersis suisse AG

Der Beitrag stellt den Bezug zum Web Monitoring her, indem er eine entsprechende induktive and graduelle Konzeptassoziierung f for die Analyse of Webdaten skizziert.



Knowledge of Grammar, Knowledge of Usage: Syntactic Probabilities Affect Pronunciation Variation

Frequent words tend to shorten (see e.g. Schuchardt 1885, Hooper 1976), as do words that have a high probability of occurrence given a neighboring word (Jurafsky et al. 2001). This tendency has been

Probabilistic models of word order and syntactic discontinuity

The thesis proposes a theory of expectation-based processing difficulty as a consequence of probabilistic syntactic disambiguation, and shows that the expectation- based theory matches a range of established experimental psycholinguistic results better than locality-based theories.

Formal grammar and information theory: together again?

  • Fernando C Pereira
  • Linguistics
    Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
  • 2000
In the last 40 years, research on models of spoken and written language has been split between two seemingly irreconcilable traditions: formal linguistics in the Chomsky tradition, and information

Word learning as Bayesian inference.

The authors present a Bayesian framework for understanding how adults and children learn the meanings of words. The theory explains how learners can generalize meaningfully from just one or a few

Word lengths are optimized for efficient communication

It is shown across 10 languages that average information content is a much better predictor of word length than frequency, which indicates that human lexicons are efficiently structured for communication by taking into account interword statistical dependencies.

A New Statistical Parser Based on Bigram Lexical Dependencies

A new statistical parser which is based on probabilities of dependencies between head-words in the parse tree, which trains on 40,000 sentences in under 15 minutes and can be improved to over 200 sentences a minute with negligible loss in accuracy.

Improved Reconstruction of Protolanguage Word Forms

An unsupervised approach to reconstructing ancient word forms is presented, and markedness features are added, which model well-formedness within each language and universal features are introduced, which support generalizations across languages.

A Probabilistic Model of Semantic Plausibility in Sentence Processing

A wide-coverage model that both assigns thematic roles to verb-argument pairs and determines a preferred interpretation by evaluating the plausibility of the resulting (verb, role, argument) triples is proposed.

Entropy Rate Constancy in Text

A constancy rate principle governing language generation implies that local measures of entropy (ignoring context) should increase with the sentence number, and it is demonstrated that this is indeed the case by measuring entropy in three different ways.