Shian-Hua Lin

Learn More
In this paper, we present a system that extracts and generalizes terms from Internet documents to represent classification knowledge of a given class hierarchy. We propose a measurement to evaluate the importance of a term with respect to a class in the class hierarchy, and denote it as support. With a given threshold, terms with high supports are sifted as(More)
We study the problem of mining the informative structure of a news Web site that consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of contents, pages) and a set of article pages linked by these TOC pages. Based on the Hyperlink Induced Topics(More)
Scientific research reports require a great deal of citation, therefore an automatic citation tool would be of great use. Due to numerous models, it is difficult to automatically transform semi-structured citation data into structured citations. Some digital library institutes, like ResearchIndex (CiteSeer) or OpCit, have attempted automatic citation(More)
This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier and two-phase search engine. The knowledge acquisition process of ACIRD(More)
In this paper, we present a collaborative intelligent Internet multi-web sites documents search system using ACIRD. ACIRD is a system that automatically learns the classification knowledge from Web pages and applies the knowledge to automatic classification of Web pages to some classes in a class hierarchy. Data mining technique is used to learn the(More)
In this paper, we study the problem of mining the informative structure of a news Web site which consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of contents, pages) and a set of article pages linked by TOC pages through informative links. It is(More)