A Machine Learning Framework for Automatically Annotating Web Pages with Simple HTML Ontology Extension (SHOE)

Abstract

With enormous amounts of information injected into the Internet every second, manual maintenance of the knowledge base on the Internet is a hopeless task. A reasonable remedy for this problem is to create a “machine understandable” Internet. To achieve this, Heflin et al. proposed an HTML-based knowledge representation language called Simple HTML Ontology Extension (SHOE). SHOE can be used in many application domains, but it requires users to manually annotate the web pages. To overcome the shortages of SHOE, we created a machine learning framework called AutoSHOE for automatically annotating web pages with SHOE annotations. With this framework, users can easily collect SHOE-annotated pages as training data, experiment with different feature selection methods and learning algorithms to find the best approach for learning a particular ontology, and automatically annotate new web pages with trained classifiers and rule sets. In addition, AutoSHOE allows new feature selectors and learners to be easily plugged into the system and run anywhere through the web. We present the AutoSHOE architecture and then discuss experimental results of our proof-of-concept design.

5 Figures and Tables

Cite this paper

@inproceedings{Lin2000AML, title={A Machine Learning Framework for Automatically Annotating Web Pages with Simple HTML Ontology Extension (SHOE)}, author={Qingfeng Lin and Stephen D. Scott and Sharad C. Seth}, year={2000} }