Learn More
A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic relations are annotated in the text, but also paradigmatic relations are made explicit by generating lexical expansions. We operationalize dis-tributional(More)
In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distribu-tional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters(More)
We introduce a new highly scalable approach for computing Distributional Thesauri (DTs). By employing pruning techniques and a distributed framework, we make the computation for very large corpora feasible on comparably small computational resources. We demonstrate this by releasing a DT for the whole vocabulary of Google Books syntactic n-grams. Evaluating(More)
We present a new unsupervised mechanism , which ranks word n-grams according to their multiwordness. It heavily relies on a new uniqueness measure that computes, based on a distributional thesaurus , how often an n-gram could be replaced in context by a single-worded term. In addition with a downweighting mechanism for incomplete terms this forms a new(More)
This paper introduces a web-based visual-ization framework for graph-based distri-butional semantic models. The visualiza-tion supports a wide range of data structures , including term similarities, similarities of contexts, support of multi-word expressions, sense clusters for terms and sense labels. In contrast to other browsers of semantic resources, our(More)
In this paper we present a word decom-pounding method that is based on distribu-tional semantics. Our method does not require any linguistic knowledge and is initialized using a large monolingual corpus. The core idea of our approach is that parts of compounds (like " candle " and " stick ") are semantically similar to the entire compound, which helps to(More)