Nutch: an Open-Source Platform for Web Search

Abstract

Nutch is an open-source project providing both complete Web search software and a platform for the development of novel Web search methods. Nutch is built on a distributed storage and computing foundation, such that every operation scales to very large collections. Core algorithms crawl, parse and index Web-based data. Plugins extend functionality at various points, including network protocols, document formats, indexing schemas and query operators.

Cite this paper

@inproceedings{Cutting2005NutchAO, title={Nutch: an Open-Source Platform for Web Search}, author={Doug Cutting}, year={2005} }