Wachirawut Thamviset

Learn More
In this paper, we propose an information extraction (IE) system for extracting data records from semi-structured documents on the Deep Web using a promising proposed technique, called Repetitive Subject Pattern. This technique was based on the hypothesis that data records in the web page must have a subject item, and the repetitive pattern of the subject(More)
Data records on a dynamic web page are often generated from databases with fixed templates or layouts by server-side scripts. Generally, each data record on the web page has a subject item that can be used to identify a data record. This paper reports a novel semi-supervised information extraction system that lets end-users give only one subject item of(More)
Generally, the database websites have provided the interfaces for giving users access their structured data. These data are usually represented in a form of data records in a coherent region of a result page. However, the page usually contains not only the data region, but also other extraneous ones. Therefore, the important tasks for extracting data(More)
  • 1