Aleksander Smywinski-Pohl

Learn More
—This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into(More)
This paper presents the analysis of the impact of a floating-point number precision reduction on the quality of text classification. The precision reduction of the vectors representing the data (e.g. TF–IDF representation in our case) allows for a decrease of computing time and memory footprint on dedicated hardware platforms. The impact of precision(More)
In this paper we try to answer the question how cross-lingual evidence may improve matching between dierent classication schemas. We concentrate specically on the task of mapping between Wikipedia categories and Cyc terms as well as the classication of Wikipedia articles to the Cyc taxonomy and show how this process may be improved by consuming the evidence(More)
We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language(More)