Learn More
We propose a novel extraction approach that exploits <i>content redundancy</i> on the web to extract structured data from <i>template-based</i> web sites. We start by populating a seed database with records extracted from a few initial sites. We then identify values within the pages of each new site that match attribute values contained in the seed set of(More)
  • 1