Learn More
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the(More)
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization.(More)
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration – Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its(More)
Assessing the impact of change on a software development project is a critical management activity. Trace-ability affords us opportunities to manage the change process through notification and synchronisation mechanisms. We present an architecture, developed as part of the EU funded Ophelia project, that supports traceability across all project artefacts.
In this paper we present the design goals and implementation outline of Carrot 2 , an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the(More)
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow-topic document references and mix them(More)
  • Krzysztof Dembczy´nski, Wojciech Kotłowski, Dawid Weiss
  • 2008
Paid advertisements displayed alongside search results constitute a major source of income for search companies. Optimizations leading to more clicks on ads are a target goal shared by advertisers and search engines. In this context, an ad's quality can be measured by the probability of it being clicked assuming it was noticed by the user (click-through(More)