Learn More
definitiva in altra sede. Abstract. We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The basic idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by(More)
This work presents an Application Domain model for Adaptive Hypermedia Systems and an architecture for its support. For the description of the high-level structure of the application domain we propose an object-oriented model based on the class diagrams of the Unified Modeling Language, extended with (i) a graph-based formalism for capturing navigational(More)
We propose an incremental technique for discovering duplicates in large databases of textual sequences, i.e. syntactically different tuples, that refer to the same real-world entity. The problem is approached from a clustering perspective: given a set of tuples, the objective is to partition them into groups of duplicate tuples. Each newly arrived tuple is(More)
The supervised classification of XML documents by structure involves learning predictive models in which certain structural regularities discriminate the individual document classes. Hitherto, research has focused on the adoption of prespecified substructures. This is detrimental for classification effectiveness, since the a priori chosen substructures may(More)
The increasing relevance of the Web as a mean for sharing information around the world has posed several new interesting issues to the computer science research community. The traditional approaches to information handling are ineffective in the new context: they are mainly devoted to the management of highly structured information, like relational(More)
In this work we propose DAEDALUS, a formal framework and system, specifically focussed on progressive combination of mining and querying operators. The core component of DAEDALUS is the MO-DMQL query language that extends SQL in two respects, namely a pattern definition operator and the capability to uniform manipulating both raw data and unveiled patterns.(More)
We introduce a measure to compute similarity between two sequences containing accesses to Web pages, to be exploited in a clustering approach for grouping sessions of accesses to a Web site. The notion of sequence similarity is parametric to the sequence topology, and the similarity among Web pages within the sequences. In our formalization, two Web pages(More)
We propose a hierarchical, model-based co-clustering framework for handling high-dimensional datasets. The technique views the dataset as a joint probability distribution over row and column variables. Our approach starts by clustering tuples in a dataset, where each cluster is characterized by a different probability distribution. Subsequently, the(More)