Learn More
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the(More)
Without search engines, the Internet would be an enormous amount of disorganized information that would certainly be interesting but perhaps not very useful. Search engines help us in all kinds of tasks and are constantly improving result relevance. The Lingo algorithm combines common phrase discovery and latent semantic indexing techniques to separate(More)
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization.(More)
In this paper we present the design goals and implementation outline of Carrot, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the(More)
Paid advertisements displayed alongside search results constitute a major source of income for search companies. Optimizations leading to more clicks on ads are a target goal shared by advertisers and search engines. In this context, an ad’s quality can be measured by the probability of it being clicked assuming it was noticed by the user (click-through(More)
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrowtopic document references and mix them(More)
Descriptive k-Means natomiast ekstrakcj˛ e fraz cz˛ estych, frazy nominalne oraz grupowanie przy pomocy algorytmu k-Means (k-´ srednich). W pracy przedstawiono eksperymenty obliczeniowe dla obu algorytmów. Wyniki eksperymentów porównuj ˛ a jako´s´c grupowania (rozumian ˛ a jako sposób odtworzenia znanego przydziału dokumentów do grup) przy u˙ zyciu Lingo(More)
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration – Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its(More)
We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: <i>Webspam-Uk2006</i> and <i>Webspam-Uk2007</i>, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when(More)