Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites

Abstract

We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term. This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.

DOI: 10.1162/089120104323093276

Extracted Key Phrases

14 Figures and Tables

02040'04'05'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

393 Citations

Semantic Scholar estimates that this publication has 393 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Navigli2004LearningDO, title={Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites}, author={Roberto Navigli and Paola Velardi}, journal={Computational Linguistics}, year={2004}, volume={30}, pages={151-179} }