Aristomenis Thanopoulos

Learn More
Corpus-based automatic extraction of collocations is typically carried out employing some statistic indicating concurrency in order to identify words that co-occur more often than expected by chance. In this paper we are concerned with some typical measures such as the t-score, Pearson's χ-square test, log-likelihood ratio, pointwise mutual information and(More)
This paper describes the automatic extraction of economic terminology from Modern Greek texts as a first step towards creating an ontological thesaurus of economic concepts. Unlike previous approaches, the domain-specific corpus utilized is varying in genre, and therefore rich in vocabulary and linguistic structure, while the pre-processing level is(More)
In this paper we address the problem of discovering word semantic similarities via statistical processing of text corpora. We propose a knowledge-poor method that exploits the sentencial context of words for extracting similarity relations between them as well as semantic in nature word clusters. The approach aims at full portability across domains and(More)
In this paper, a probabilistic framework for acquiring domain knowledge from heterogeneous corpora is introduced. The acquired information is used for intelligent human-computer interaction through the web. The application selected for the framework experimentation was education on issues of chemotherapy of nosocomial and community acquired pneumonia. The(More)
This paper describes Eksairesis, a system for learning economic domain knowledge automatically from Modern Greek text. The knowledge is in the form of economic terms and the semantic relations that govern them. The entire process in based on the use of minimal language-dependent tools, no external linguistic resources, and merely free, unstructured text.(More)
Previous approaches on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theoretical perspective of contextual lexical semantics suggests that larger segments of text, specifically non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this(More)
For the present paper, we endeavor with the issue of identification of a user's plan, in terms of user modeling under uncertainty. Unlike the majority of existing natural language understanding engines, the presented framework automatically encodes semantic representation of a user's query using a Bayesian networks framework. The structure of the networks(More)