MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data

Abstract

The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.

DOI: 10.1007/11926078_19

Extracted Key Phrases

9 Figures and Tables

Statistics

0102030'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

101 Citations

Semantic Scholar estimates that this publication has 101 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Harth2006MultiCrawlerAP, title={MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data}, author={Andreas Harth and J{\"{u}rgen Umbrich and Stefan Decker}, booktitle={International Semantic Web Conference}, year={2006} }