Ashwin Tengli

Learn More
I. INTRODUCTION Vertex is a system developed at Yahoo! for extracting structured records at Web scale from template-based Web pages. As an example, consider the page shown in Figure 1 for restaurant " Chimichurri Grill " from the aggregator Web site The page contains a wealth of information including details like the restaurant name, category,(More)
We propose a novel extraction approach that exploits <i>content redundancy</i> on the web to extract structured data from <i>template-based</i> web sites. We start by populating a seed database with records extracted from a few initial sites. We then identify values within the pages of each new site that match attribute values contained in the seed set of(More)
  • 1