Learn More
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the(More)
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization.(More)
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration – Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its(More)
In this paper we present the design goals and implementation outline of Carrot 2 , an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the(More)
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow-topic document references and mix them(More)
We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: <i>Webspam-Uk2006</i> and <i>Webspam-Uk2007</i>, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when(More)
OBJECTIVES The objective of this research was to design a clinical decision support system (CDSS) that supports heterogeneous clinical decision problems and runs on multiple computing platforms. Meeting this objective required a novel design to create an extendable and easy to maintain clinical CDSS for point of care support. The proposed solution was(More)