A framework for analyzing semantic change of words across time

@article{Jatowt2014AFF,
  title={A framework for analyzing semantic change of words across time},
  author={Adam Jatowt and Kevin Duh},
  journal={IEEE/ACM Joint Conference on Digital Libraries},
  year={2014},
  pages={229-238}
}
  • A. Jatowt, Kevin Duh
  • Published 8 September 2014
  • Computer Science
  • IEEE/ACM Joint Conference on Digital Libraries
Recently, large amounts of historical texts have been digitized and made accessible to the public. Thanks to this, for the first time, it became possible to analyze evolution of language through the use of automatic approaches. In this paper, we show the results of an exploratory analysis aiming to investigate methods for studying and visualizing changes in word meaning over time. In particular, we propose a framework for exploring semantic change at the lexical level, at the contrastive-pair… 
Ngram in Detecting Linguistic Shifts over Time
TLDR
A new approach to quantitatively recognize semantic changes of words during the period between 1800 and 1990 is developed and it is shown that the system is more robust against morphological language variations.
Modeling the dynamics of domain specific terminology in diachronic corpora
TLDR
The notion of context volatility as a new measure for detecting semantic change and applies it to key term extraction in a political science case study to identify periods of time that are characterised by intense controversial debates or substantial semantic transformations.
A larger-scale evaluation resource of terms and their shift direction for diachronic lexical semantics
TLDR
An English evaluation set which is larger, more varied, and more realistic than seen to date, with terms derived from a historical thesaurus is introduced, and it is shown that performance on the new data set is much lower than earlier reported findings, setting a new standard.
Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora
TLDR
This work proposes an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word, and demonstrates its effectiveness in 9 different setups, considering different corpus splitting criteria.
Survey of Computational Approaches to Lexical Semantic Change
TLDR
This article focuses on diachronic conceptual change as an extension of semantic change, and a survey of recent computational techniques to tackle lexical semantic change currently under review.
Identify Shifts of Word Semantics through Bayesian Surprise
TLDR
This paper presents a novel computational method that explicitly establishes the stable topological structure of word semantics and identifies the surprising changes in the semantic space over time through a principled statistical method.
Semantic Change Detection With Gaussian Word Embeddings
TLDR
This work proposes a Gaussian word embedding (w2g)-based method and presents a comprehensive study for the LSC detection under the SemEval-2020 Task 1 evaluation framework as well as using Google N-gram corpus.
Improving semantic change analysis by combining word embeddings and word frequencies
TLDR
This article proposes SCAF, Semantic Change Analysis with Frequency, which abstracts from the concrete embeddings and includes word frequencies as an orthogonal feature and leverages existing approaches for time series analysis, by using change detection methods to identify semantic shifts.
A Critical Assessment of a Method for Detecting Diachronic Meaning Shifts: Lessons Learnt from Experiments on Dutch
TLDR
This work trains diachronic word embeddings on Dutch newspaper data and compares representations of the same terms from different times to each other to verify whether such comparison can highlight the emergence of a new (figurative) meaning for a given term.
Exploring Semantic Change of Chinese Word Using Crawled Web Data
TLDR
Three different word representation methods are extended to including temporal information and trained and tested based on the huge amount of data provided by Sogou, a Chinese web search engine provider.
...
...

References

SHOWING 1-10 OF 37 REFERENCES
Large scale analysis of changes in english vocabulary over recent time
TLDR
The results of large-scale studies on the usage of words and the evolution of English language vocabulary over the last two centuries are reported to help with understanding its impact on readability and retrieval of historical documents.
Diachronic Variation in Grammatical Relations
TLDR
Inspired by the econometric technique of measuring return and volatility instead of relative frequencies, these techniques are proposed as a way to better characterize changes in grammatical patterns like nominalization, modification and comparison to better inform intuitions about specialist domains and changes in language use as a whole.
NEER: An Unsupervised Method for Named Entity Evolution Recognition
TLDR
This work proposes NEER, an unsupervised method for named entity evolution recognition independent of external knowledge sources, and finds time periods with high likelihood of evolution, using a sliding window co-occurrence method.
Oxford Dictionary of Word Origins
Over 3,000 entries Newly updated to incorporate recent additions to the English language, this popular dictionary provides a fascinating exploration of the origins and development of words in the
Distributional Structure
TLDR
This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.
A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books
TLDR
A dataset of syntactic-ngrams (counted dependency-tree fragments) based on a corpus of 3.5 million English books includes temporal information, facilitating new kinds of research into lexical semantics over time.
The Corpus of Contemporary American English as the first reliable monitor corpus of English
The Corpus of Contemporary American English is the first large, genre-balanced corpus of any language, which has been designed and constructed from the ground up as a ‘monitor corpus’, and which can
Short term diachronic shifts in part-of-speech frequencies: a comparison of the tagged LOB and F-LOB corpora.
TLDR
It is shown that while part-of-speech frequencies have not remained constant over the period investigated, the shifts are usually not big enough to invalidate the results obtained in analyses of the untagged material.
From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales
TLDR
It is shown how sentiment analysis can be used in tandem with effective visualizations to quantify and track emotions in both individual books and across very large collections.
Word Epoch Disambiguation: Finding How Words Change Over Time
TLDR
The novel task of "word epoch disambiguation," defined as the problem of identifying changes in word usage over time, is introduced and it is shown that the task is feasible, and significant differences can be observed between occurrences of words in different periods of time.
...
...