Learn More
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the(More)
Without search engines, the Internet would be an enormous amount of disorganized information that would certainly be interesting but perhaps not very useful. Search engines help us in all kinds of tasks and are constantly improving result relevance. The Lingo algorithm combines common phrase discovery and latent semantic indexing techniques to separate(More)
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization.(More)
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow-topic document references and mix them(More)
This paper relates to a technique of improving results visu-alization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and development of search results clustering algorithms – Carrot 2. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our(More)
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration – Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its(More)
This paper is a follow-up to Jan Daciuk's experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory [4]. We investigate several techniques of reducing the memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not(More)
In this paper we present the design goals and implementation outline of Carrot 2 , an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the(More)