This paper describes an application which aims at producing Polish descriptions for the data available as Linked Open Data, the MusicBrainz knowledge base contents in particular.
We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language… (More)
The aim of the research presented in the article is the mapping between the English Wikipedia categories and OpenCyc types. The mapping algorithm is heuristic and it takes into account structural similarities between the categories and the corresponding types. The achieved mapping precision ranges from 82 to 92 % (depending on the evaluation scheme), recall… (More)
In this paper we try to answer the question how cross-lingual evidence may improve matching between dierent classication schemas. We concentrate specically on the task of mapping between Wikipedia categories and Cyc terms as well as the classication of Wikipedia articles to the Cyc taxonomy and show how this process may be improved by consuming the evidence… (More)
This document describes an algorithm aimed at recognizing Named Entities in Polish text, which is powered by two knowledge sources: the Polish Wikipedia and the Cyc ontology. Besides providing the rough types for the recognized entities, the algorithm links them to the Wikipedia pages and assigns precise semantic types taken from Cyc. The algorithm is… (More)
—This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into… (More)