Aleksander Smywinski-Pohl

  • Citations Per Year
Learn More
We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language(More)
In this paper we try to answer the question how cross-lingual evidence may improve matching between di erent classi cation schemas. We concentrate speci cally on the task of mapping between Wikipedia categories and Cyc terms as well as the classi cation of Wikipedia articles to the Cyc taxonomy and show how this process may be improved by consuming the(More)
This paper presents the analysis of the impact of a floating-point number precision reduction on the quality of text classification. The precision reduction of the vectors representing the data (e.g. TF–IDF representation in our case) allows for a decrease of computing time and memory footprint on dedicated hardware platforms. The impact of precision(More)
This document describes an algorithm aimed at recognizing Named Entities in Polish text, which is powered by two knowledge sources: the Polish Wikipedia and the Cyc ontology. Besides providing the rough types for the recognized entities, the algorithm links them to the Wikipedia pages and assigns precise semantic types taken from Cyc. The algorithm is(More)
In this paper we discuss the problem of building the Polish lexicon for the Cyc ontology. As the ontology is very large and complex we describe semi-automatic translation of part of it, which might be useful for tasks lying on the border between the fields of Semantic Web and Natural Language Processing. We concentrate on precise identification of lexemes,(More)