Ruth Yuee Zhang

Learn More
There is a vast amount of valuable information in HTML documents, widely distributed across the World Wide Web and across corporate intranets. Unfortunately, HTML is mainly presentation oriented and hard to query. In this paper, we develop a system to extract desired information (records) from thousands of HTML documents, starting from a small set of(More)
  • 1