Dirk Geeraerts

Learn More
Vector-based models of lexical semantics retrieve semantically related words automatically from large corpora by exploiting the property that words with a similar meaning tend to occur in similar contexts. Despite their increasing popularity, it is unclear which kind of semantic similarity they actually capture and for which kind of words. In this paper, we(More)
The language of IRC – Internet Relay Chat – is in many respects an example of " spoken language in written form " : although produced in a written medium, it shares with spoken language a dialogical immediacy that ordinary written text usually lacks, as a result of which, it tends to appear highly informal, even to the untrained observer. Linguists should(More)
In statistical NLP, Semantic Vector Spaces (SVS) are the standard technique for the automatic modeling of lexical semantics. However, it is largely unclear how these black-box techniques exactly capture word meaning. To explore the way an SVS structures the individual occurrences of words, we use a non-parametric MDS solution of a token-by-token similarity(More)
Over the last decade, the Leuven Research Unit of Quantitative Lexicology and Variational Linguistics has developed a corpus-based method for the investigation of region and register variation in and between the national variants of Dutch, viz. Belgian Dutch and Netherlandic Dutch. The basic characteristics of the methodology are the following. Geeraerts,(More)
Semantic similarity is a key issue in many computational tasks. This paper goes into the development and evaluation of two common ways of automatically calculating the semantic similarity between two words. On the one hand, such methods may depend on a manually constructed thesaurus like (Euro)WordNet. Their performance is often evaluated on the basis of a(More)