Quantitative Analysis of Culture Using Millions of Digitized Books

Linguistic and cultural changes are revealed through the analyses of words appearing in books. We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as… 
Measuring Linguistic and Cultural Evolution Using Books and Tweets
This dissertation uses hundreds of thousands of books spanning two centuries scanned by Google and over 100 billion messages posted to the social media platform, Twitter, over the course of a decade to study the English language, as well as study the evolution of culture and society as inferred from the changes in language.
Identification of Literary Movements Using Complex Networks to Represent Texts
This study identified literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books.
How Does Scientific Progress Affect Cultural Changes? A Digital Text Analysis*
We study the relationship between scientific and cultural change, two phenomena that the economics literature identifies as key drivers of long-term growth, but that have mostly been studied
Thousands of Titles Without Authors: Digitized Newspapers, Serial Fiction, and the Challenges of Anonymity
The study of literary anonymity and pseudonymity outside of our existing disciplinary infrastructure raises a variety of questions for book history scholars. How can we study areas of print culture
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
Overall, the findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
“The General Practice of the Nation”: Walt Whitman, Language, and Computerized Search in the Nineteenth-Century Archive
Word-search technologies have played a significant role in literary scholarship for decades, yet they have received little attention from literary theorists. This paper considers how we might more
The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings
It is argued word embedding models are a useful tool for the study of culture using a historical analysis of shared understandings of social class as an empirical case, specifying a relational model of meaning consistent with contemporary theories of culture.
Centuries of Sociology in Millions of Books
The Google Books N-gram corpus contains an enormous volume of digitized data, which, to the best of our knowledge, sociologists have yet to fully utilize. In this paper, we mine this data to shed
Extraction and Analysis of Character Interaction Networks From Plays and Movies
This project develops and applies methods for automatically extracting character interaction networks from works of entertainment and uses the properties of the resulting networks to draw conclusions about the works at hand.
Cultural models of development and of a developmental hierarchy of societies have powerfully shaped world history, providing motivation and justification for colonialism, religious evangelism,


Quantifying the evolutionary dynamics of language
This study provides a quantitative analysis of the regularization process by which ancestral forms gradually yield to an emerging linguistic rule, and studies how the rate of regularization depends on the frequency of word usage.
Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death
Analysis of dynamic properties of 107 words recorded in English, Spanish and Hebrew over the period 1800–2008 shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors.
Languages cool as they expand: Allometric scaling and the decreasing need for new words
The annual growth fluctuations of word use has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion.
Reflexes of grammar in patterns of language change
ABSTRACT When one form replaces another over time in a changing language, the new form does not occur equally often in all linguistic contexts. Linguists have generally assumed that those contexts in
The Frenzy of Renown: Fame and Its History
"Remarkably ambitious . . . an impressive tour de force." --Washington Post Book WorldFor Alexander the Great, fame meant accomplishing what no mortal had ever accomplished before. For Julius Caesar,
[Collective memory].
This workshop continues the discussion from the previous year on the recent developments in memory studies in terms of approaches, frameworks and methods and how these might be relevant to archival scholarship.
Implication Analysis: A Pragmatic Proposal for Linking Theory and Data in the Social Sciences
This work describes a set of procedures for using empirical data to rigorously evaluate theories and hypotheses without resorting to the mimicking of hard science.
Fifty years among the new words : a dictionary of neologisms, 1941-1991
Acknowledgments Introduction: Collecting new words The making of new words The motives for new words References Index of new words with glosses Index of contributors Among the new words, 1941-91.
From Usage to Grammar: The Mind's Response to Repetition
It is argued that high-frequency instances of constructions undergo grammaticization processes (which produce further change), function as the central members of categories formed by constructions, and retain their old forms longer than lower- frequencies instances under the pressure of newer formations.
The Computational Nature of Language Learning and Evolution
Partha Niyogi investigates the roles of natural selection, communicative efficiency, and learning in the origin and evolution of language -- in particular, whether natural selection is necessary for the emergence of shared languages.