Investigation of Word Senses over Time Using Linguistic Corpora

  title={Investigation of Word Senses over Time Using Linguistic Corpora},
  author={Christian P{\"o}litz and Thomas Bartz and Katharina Morik and Angelika Storrer},
Word sense induction is an important method to identify possible meanings of words. Word co-occurrences can group word contexts into semantically related topics. Besides the pure words, temporal information provide another dimension to further investigate the development of the word meanings over time. Large digital corpora of written language, such as those that are held by the CLARIN-D centers, provide excellent possibilities for such kind of linguistic research on authentic language data. In… 
Word embeddings: reliability & semantic change
The JeSemE website is created to make word embedding based diachronic research more accessible and investigate the applicability of these methods by investigating the historical understanding of electricity as well as words connected to Romanticism.
On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models
It is found that semantic change is linear in two senses: today’s embedding vector (= meaning) of words can be derived as linear combinations of embedding vectors of their neighbors in previous time periods.
LL(O)D and NLP perspectives on semantic change for humanities research
The aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST Action Nexus Linguarum, European network for Web-centred linguistic data science.
Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama
The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as


Word-Sense Disambiguation Using Statistical Methods
A statistical technique for assigning senses to words is described, which incorporated into the statistical machine translation system the error rate of the system decreased by thirteen percent.
Bayesian Word Sense Induction
This work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words.
Inducing Word Senses to Improve Web Search Result Clustering
This work first acquires the senses of a query by means of a graph-based clustering algorithm that exploits cycles in the co-occurrence graph of the query, then clusters the search results based on their semantic similarity to the induced word senses.
Word sense disambiguation: A survey
This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.
Towards Tracking Semantic Change by Visual Analytics
The aim of this study is to offer a new instrument for investigating the diachronic development of word senses in a way that allows for a better understanding of the nature of semantic change in general.
Tony McEnery, Richard Xiao & YuKio Tono, Corpus-based language studies: An advanced resource book . London and New York: Routledge, 2006. Pp. xix, 386. Pb $33.95.
Originally associated mainly with work in lexicography and grammar, corpus linguistics has more recently established its relevance for a wide range of linguistic endeavors, including research into
Topics over time: a non-Markov continuous-time model of topical trends
An LDA-style topic model is presented that captures not only the low-dimensional structure of data, but also how the structure changes over time, showing improved topics, better timestamp prediction, and interpretable trends.
Finding scientific topics
  • T. Griffiths, M. Steyvers
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Latent Dirichlet Allocation
Das Digitale Wörterbuch der Deutschen Sprache (DWDS)
Es hat die Vollendung erlebt, und auch wieder nicht, denn als im Jahre 1960 die letzte Lieferung des Deutschen Worterbuchs erschien, da war langst deutlich, dass weite Teile dieses gewaltigen Werks