Michal Lopuszynski

Learn More
In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset(More)
In this work, we present an application of the recently proposed unsupervised keyword extraction algorithm RAKE to a corpus of Polish legal texts from the field of public procurement. RAKE is essentially a language and domain independent method. Its only language-specific input is a stoplist containing a set of non-content words. The performance of the(More)
Topic modelling algorithms are statistical methods capable of detecting common themes present in the analyzed text corpora. In this work, the latent Dirich-let allocation (LDA) is used [1]. It operates on documents in the bag-of-words representation and returns a set of detected topics (i.e., groups of words and their probabilities reflecting the importance(More)
  • 1