A Tool for Semi-Automatic Generation and Maintenance of Taxonomies from Semi-Structured Documents

Abstract

This chapter introduces OntoExtractor, a tool for the semi-automatic generation of the taxonomy from a set of documents or data sources. The tool generates the taxonomy in a bottom-up fashion. Starting from structural analysis of the documents, it produces a set of clusters, which can be refined by a further grouping created by content analysis. Metadata describing the content of each cluster is automatically generated and analysed by the tool for producing the final taxonomy. A simulation of a tool, based on an implicit and explicit voting mechanism, for the maintenance of the taxonomy is also described. The author depicts a system that can be used to generate the taxonomy from a heterogeneous source of information, using wrappers for converting the original format of the document to a structured one. This way, OntoExtractor can virtually generate the taxonomy from any source of information just adding the proper wrapper. Moreover, the trust mechanism allows a reliable method for maintaining the taxonomy and for overcoming the unavoidable generation of wrong classes in the taxonomy.

1 Figure or Table

Cite this paper

@inproceedings{Leida2015ATF, title={A Tool for Semi-Automatic Generation and Maintenance of Taxonomies from Semi-Structured Documents}, author={Marcello Leida}, year={2015} }