Learn More
The research challenge addressed in this paper is to devise effective techniques for identifying <i>task-based sessions</i>, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given <i>task</i>. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a(More)
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning(More)
This paper presents a parallel programming methodology that ensures easy programming, eeciency, and portability of programs to diierent machines belonging to the class of the general-purpose, distributed memory, MIMD architectures. The methodology is based on the deenition of a new, high-level, explicitly parallel language, called P 3 L, and of a set of(More)
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of(More)
In this paper we investigate some issues and solutions related to the design of a Data Warehouse (DW), storing several aggregate measures about trajectories of moving objects. First we discuss the loading phase of our DW which has to deal with overwhelming streams of trajectory observations , possibly produced at different rates, and arriving in an(More)
Web Search Engines provide a large-scale text document retrieval service by processing huge <i>Inverted File</i> indexes. Inverted File indexes allow fast query resolution and good memory utilization since their <i>d</i>-gaps representation can be effectively and efficiently compressed by using variable length encoding methods. This paper proposes and(More)
<i>Entity Linking</i> is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The most important of such features is <i>entity relatedness</i>. Indeed, we argue that these algorithms(More)
Inspired by emerging multi-core computer architectures, in this paper we present MT_CLOSED, a multi-threaded algorithm for frequent closed itemset mining (FCIM). To the best of our knowledge, this is the first FCIM parallel algorithm proposed so far. We studied how different duplicate checking techniques, typical of FCIM algorithms, may affect this(More)
Entity Linking (EL) enables to automatically link unstruc-tured data with entities in a Knowledge Base. Linking unstructured data (like news, blog posts, tweets) has several important applications: for example it allows to enrich the text with external useful contents or to improve the categorization and the retrieval of documents. In the latest years many(More)
The performance of an algorithm that mines frequent sets from transactional databases may severely depend on the specific features of the data being analyzed. Moreover, some architectural characteristics of the computational platform used – e.g. the available main memory – can dramatically change the runtime behaviors of the algorithm. In this paper we(More)