The Anatomy of a Large-Scale Hypertextual Web Search Engine

@article{Brin1998TheAO,
  title={The Anatomy of a Large-Scale Hypertextual Web Search Engine},
  author={Sergey Brin and Lawrence Page},
  journal={Comput. Networks},
  year={1998},
  volume={30},
  pages={107-117}
}

Figures and Tables from this paper

A parallel view for search engines

TLDR
This paper describes the cooperative work between the Crawler, Indexer and the Searcher and describes Scalability is of concern during index construction as well as during query processing.

Mining the Web's Link Structure

TLDR
Clever is a search engine that analyzes hyperlinks to uncover two types of pages: authorities, which provide the best source of information on a given topic; and hubs, which provides collections of links to authorities.

An Analytical Study of Intelligent Parallel Web Crawler

TLDR
Standard crawler architecture and the modules of a search engine, used to access information from WWW, are described.

A Review of Web Search Engine Applications: InfoSpider, Waco, WebComb and ContentUsageAnts Models

TLDR
The way in which search engines work is presented and the basic tasks of a search engine are described and an overview of some models that been used to improve the way they work like Waco, InfoSpider, ContentUsageAnts and Webcomb model are introduced.

CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA

TLDR
The basics of crawlers and the commonly used techniques of crawling the web are discussed, the pseudo code of basic crawling algorithms, their implementation in C language along with simplified flowcharts are discussed and a comparison study is given in a table.

Link-Based Web Analysis : PageRank and HITS Algorithms

TLDR
Algorithms to improve the performance of vertical search engine spiders were investigated: a breadth-first graph-traversal algorithm with no heuristics to refine the search process, a best-first traversal algorithm that used a hyperlink-analysis heuristic, and a spreading-activation algorithm based on modeling the Web as a neural network.

ea tu re Mining the Web ’ s Link Structure

TLDR
The creation of a hyperlink by the author of a Web page represents an implicit endorsement of the page being pointed to; by mining the collective judgment contained in the set of such endorsements, the Clever system can gain a richer understanding of the relevance and quality of the Web's contents.

Web Crawler Architecture

  • Marc Najork
  • Computer Science
    Encyclopedia of Database Systems
  • 2009
TLDR
In order to crawl a substantial fraction of the “surface web” in a reasonable amount of time, web crawlers must download thousands of pages per second, and are typically distributed over tens or hundreds of computers.

The Design & Implementation of a Small Scale Crawler based Web Search Engine

  • Iti Shrunkhla
  • Computer Science
    2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT)
  • 2019
TLDR
A framework of a small-scale crawler based web search engine, being implemented to gather content over the web, efficiently designed to search and store the Web content, thereby producing equally effective search results as other similar structures.

AN APPROACH TO DESIGN INCREMENTAL PARALLEL WEBCRAWLER

TLDR
A Multi Threaded (MT) server based novel architecture for incremental parallel web crawler has been desi gned that helps to reduce overlapping, quality and netwo rk bandwidth problems and web page change detection methods have been developed to refresh web documents.
...

References

SHOWING 1-10 OF 18 REFERENCES

Lycos: design choices in an Internet search service

TLDR
The history and precursors of the Lycos system for collecting, storing, and retrieving information about pages on the Web are outlined and some of the design choices made in building this Web indexer are discussed.

ParaSite: Mining Structural Information on the Web

Search En-gines for the World Wide Web: A Compara-tive Study and Evaluation Methodology

TLDR
The authors of this study found that Alta Vista outperformed Excite and Lycos in both search facilities and retrieval performance although Lycos had the largest coverage of Web resources among the three Web search engines examined.

GENVL and WWWW: Tools for taming the Web

HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

TLDR
Experience with HyPursuit suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies, and is encouraged by preliminary results on clustering based on both document contents and hyperlink structures.

Queries and computation on the web

TLDR
Surprisingly, stratified and well-founded semantics for negation turn out to have basic shortcomings in this context, while inflationary semantics emerges as an appealing alternative.

The PageRank Citation Ranking : Bringing Order to the Web

TLDR
This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.