Nutch: A Flexible and Scalable Open-Source Web Search Engine

  title={Nutch: A Flexible and Scalable Open-Source Web Search Engine},
  author={Rohit Khare},
Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest — one of its signature features is the ability to “explain” its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale… CONTINUE READING
Highly Cited
This paper has 166 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 55 extracted citations

A distributed search engine based on a re-ranking algorithm model

2015 10th International Conference on Computer Science & Education (ICCSE) • 2015
View 11 Excerpts
Method Support
Highly Influenced

Harvesting Information from Heterogeneous Sources

2011 European Intelligence and Security Informatics Conference • 2011
View 6 Excerpts
Method Support
Highly Influenced

heteroHarvest: Harvesting information from heterogeneous sources

Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics • 2011
View 2 Excerpts
Highly Influenced

A Cleaning Algorithm for Noiseless Opinion Mining Corpus Construction

2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA) • 2018
View 2 Excerpts

Realistic Traffic Generation for Web Robots

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) • 2017
View 2 Excerpts

166 Citations

Citations per Year
Semantic Scholar estimates that this publication has 166 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 30 references

Design Considerations for the Apache Server API

Computer Networks • 1996
View 3 Excerpts
Highly Influenced

The Search: Business and Culture in the Age of Google, Penguin

J. Battelle
View 2 Excerpts

A Conversation with Matt Wells in ACM

S. Kirsch
Queue, April, • 2004
View 1 Excerpt

Apple Developer Connection . Search Kit Reference

M. S. Aktas, M. A. Nacar, F. Menczer
Personalizing PageRank Based on Domain Profiles , in Workshop on Web Mining and Web Usage Analysis • 2004

Balancing Act: How News Portals

J. D. Lasica
Serve Up Political Stories in Online Journalism Review, • 2004
View 1 Excerpt

CSE454 Lecture Notes: Inside Nutch

M. Cafarella
University of Washington, • 2004
View 1 Excerpt

Comparison of Nutch and Google Search Engine Implementations on the Oregon State University Website

L. Benedict
View 1 Excerpt

Cross-instance Search System: Search Engine Comparison. Report for the California Digital Library by Snyder-Haye Inc

M. Haye
View 1 Excerpt