Vedrana Vidulin

Learn More
A web page is a complex document which can share conventions of several genres, or contain several parts, each belonging to a different genre. To properly address the genre interplay, a recent proposal in automatic web genre identification is multi-label classification. The dominant approach to such classification is to transform one multi-label machine(More)
Modern search engines are typically queried with keywords, which foremostly convey the topic of the sought web page. Consequently the resulting top hits are often topically relevant, but nonetheless not what the user wants. The premise of this paper is that the relevance of the hits can be improved when also searching by genre, classification criterion(More)
We present initial results from an international and multidisciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety,(More)
Modern search engines aim at classifying web pages not only according to topics, but also according to genres. This paper presents the results of an attempt to train a genre classifier. We present features extracted from a 20-genre corpus used for training the genre classifiers and the results of using different machine learning (ML) algorithms in the(More)
Abbreviated title: Impact of High-Level Knowledge on Economy through IDM □ This paper describes a novel algorithm for finding the most important relations with the use of data mining. As an example application, the impact of high-level knowledge on economic welfare was analyzed. Our approach, based on interactive data mining, not only helps to discover the(More)
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, two machine learning algorithms, one(More)
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, three machine learning algorithms,(More)
Has greater investment in education and research and development (R&D) a positive impact on economic welfare? We analyzed this question using the Weka machine learning and data mining systems. We collected data from the statistical databases for the year 2001. The obtained classification trees show that the level of participation in higher levels of(More)
Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed ProTraits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These annotations(More)