Learn More
The research challenge addressed in this paper is to devise effective techniques for identifying <i>task-based sessions</i>, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given <i>task</i>. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a(More)
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of(More)
This paper presents a parallel programming methodology that ensures easy programming, eeciency, and portability of programs to diierent machines belonging to the class of the general-purpose, distributed memory, MIMD architectures. The methodology is based on the deenition of a new, high-level, explicitly parallel language, called P 3 L, and of a set of(More)
— This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database, and adopts a particular visit and(More)
Web Search Engines provide a large-scale text document retrieval service by processing huge <i>Inverted File</i> indexes. Inverted File indexes allow fast query resolution and good memory utilization since their <i>d</i>-gaps representation can be effectively and efficiently compressed by using variable length encoding methods. This paper proposes and(More)
Characteristics of the dataset This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written(More)
Inspired by emerging multi-core computer architectures, in this paper we present MT CLOSED, a multi-threaded algorithm for frequent closed itemset mining (FCIM). To the best of our knowledge, this is the first FCIM parallel algorithm proposed so far. We studied how different duplicate checking techniques, typical of FCIM algorithms, may affect this(More)
Increasingly the datasets used for data mining are becoming huge and physically distributed. Since the distributed knowledge discovery process is both data and computational intensive, the Grid is a natural platform for deploying a high performance data mining service. The focus of this paper is on the core services of such a Grid infrastructure. In(More)
One of the main problems raising up in the frequent closed itemsets mining problem is the duplicate detection. In this paper we propose a general technique for promptly detecting and discarding duplicate closed itemsets, without the need of keeping in the main memory the whole set of closed patterns. Our approach can be exploited with substantial(More)