Andrea Tagarelli

Learn More
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The basic idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and(More)
Dealing with structure and content semantics underlying semistructured documents is challenging for any task of document management and knowledge discovery conceived for such data. In this work we address the novel problem of clustering semantically related XML documents according to their structure and content features. XML features are generated by(More)
Document clustering has been recognized as a central problem in text data management. Such a problem becomes particularly challenging when document contents are characterized by subtopical discussions that are not necessarily relevant to each other. Existing methods for document clustering have traditionally assumed that a document is an indivisible unit(More)
The increasing availability of heterogeneous XML informative sources has raised a number of issues concerning how to represent and manage semistructured data. Although XML sources can exhibit proper structures and contents, differently annotated XML documents may in principle encode related semantics due to subjective definitions of markup tags. Discovering(More)
The increasing relevance of the Web as a mean for sharing information around the world has posed several new interesting issues to the computer science research community. The traditional approaches to information handling are ineffective in the new context: they are mainly devoted to the management of highly structured information, like relational(More)
We introduce a technique based on data mining algorithms for classifying incoming messages, as a basis for an overall architecture for maintenance and management of e-mail messages. We exploit clustering techniques for grouping structured and unstructured information extracted from e-mail messages in an unsupervised way, and exploit the resulting algorithm(More)
A common limit of most existing methods that manage XML structure information is that they do not handle the semantic meanings that might be associated to the markup tags. In this paper, we study how to map structure information available from XML elements into semantically related concepts in order to support the generation of XML semantic features of XML(More)
The growing availability of information on the Web has raised a challenging problem: can a Web-based information system tailor itself to different user requirements with the ultimate goal of personalizing and improving the users' experience in accessing the contents of a website? This paper proposes a new approach to website personalization based on the(More)