Learn More
Vector-based models of lexical semantics retrieve semantically related words automatically from large corpora by exploiting the property that words with a similar meaning tend to occur in similar contexts. Despite their increasing popularity, it is unclear which kind of semantic similarity they actually capture and for which kind of words. In this paper, we(More)
In statistical NLP, Semantic Vector Spaces (SVS) are the standard technique for the automatic modeling of lexical semantics. However, it is largely unclear how these black-box techniques exactly capture word meaning. To explore the way an SVS structures the individual occurrences of words, we use a non-parametric MDS solution of a token-by-token similarity(More)
The language of IRC – Internet Relay Chat – is in many respects an example of " spoken language in written form " : although produced in a written medium, it shares with spoken language a dialogical immediacy that ordinary written text usually lacks, as a result of which, it tends to appear highly informal, even to the untrained observer. Linguists should(More)
Over the last decade, the Leuven Research Unit of Quantitative Lexicology and Variational Linguistics has developed a corpus-based method for the investigation of region and register variation in and between the national variants of Dutch, viz. Belgian Dutch and Netherlandic Dutch. The basic characteristics of the methodology are the following. Geeraerts,(More)
1 Computers and the Humanities xx: nnn-nnn, yyyy © yyyy. Kluwer Academic Publishers. Printed in the Netherlands. Abstract. In this text we present " profile-based linguistic uniformity " , a method designed to compare language varieties on the basis of a wide range of potentially heterogeneous linguistic variables. In many respects a parallel can be drawn(More)