Raffaele Perego

Learn More
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of(More)
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning(More)
The research challenge addressed in this paper is to devise effective techniques for identifying <i>task-based sessions</i>, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given <i>task</i>. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a(More)
The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Given the wealth of images on the Web, CBIR systems must in fact leap towards Web-scale datasets. In this paper, we report on our experience in building a test collection of 100 million(More)
The performance of an algorithm that mines frequent sets from transactional databases may severely depend on the specific features of the data being analyzed. Moreover, some architectural characteristics of the computational platform used – e.g. the available main memory – can dramatically change the runtime behaviors of the algorithm. In this paper we(More)
The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering(More)
This paper proposes an efficient and effective solution to the problem of choosing the queries to suggest to web search engine users in order to help them in rapidly satisfying their information needs. By exploiting a weak function for assessing the similarity between the current query and the knowledge base built from historical users’ sessions, we(More)
Web Search Engines provide a large-scale text document retrieval service by processing huge <i>Inverted File</i> indexes. Inverted File indexes allow fast query resolution and good memory utilization since their <i>d</i>-gaps representation can be effectively and efficiently compressed by using variable length encoding methods. This paper proposes and(More)
One of the main problems raising up in the frequent closed itemsetsmining problem is the duplicate detection. In this paper we propose a general technique for promptly detecting and discarding duplicate closed itemsets, without the need of keeping in the main memory the whole set of closed patterns. Our approach can be exploited with substantial performance(More)