• Publications
  • Influence
Graph structure in the Web
TLDR
The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery and the sociological phenomena which characterize its evolution. Expand
  • 2,813
  • 237
  • PDF
A taxonomy of web search
  • A. Broder
  • Computer Science
  • SIGF
  • 1 September 2002
TLDR
We explore this taxonomy of web searches and discuss how global search engines evolved to deal with web-specific needs. Expand
  • 2,027
  • 236
  • PDF
On the resemblance and containment of documents
  • A. Broder
  • Mathematics, History
  • Proceedings. Compression and Complexity of…
  • 11 June 1997
TLDR
We define two mathematical notions: their resemblance r(A, B) and their containment c(B, A) that seem to capture well the informal notions of "roughly the same" androughly contained. Expand
  • 1,649
  • 187
  • PDF
Summary cache: a scalable wide-area web cache sharing protocol
TLDR
In this paper we demonstrate the benefits of cache sharing, measure the overhead of the existing protocols, and propose a new protocol called "summary cache". Expand
  • 2,091
  • 180
  • PDF
Network Applications of Bloom Filters: A Survey
TLDR
We survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications. Expand
  • 2,041
  • 162
  • PDF
Syntactic Clustering of the Web
TLDR
We have developed an efficient way to determine the syntactic similarity of files and have applied it to every document on the World Wide Web. Expand
  • 1,454
  • 124
  • PDF
Efficient query evaluation using a two-level retrieval process
TLDR
We present an efficient query evaluation method based on a two level approach: at the first level, our method iterates in parallel over query term postings and identifies candidate documents using an approximate evaluation taking into account only partial information on term occurrences and no query independent factors; at the second level, promising candidates are fully evaluated and their exact scores are computed. Expand
  • 379
  • 75
  • PDF
Balanced Allocations
TLDR
We show that with high probability, the fullest box contains only ln ln n/ln 2 + O(1) balls---exponentially less than before. Expand
  • 738
  • 70
Min-Wise Independent Permutations
TLDR
We define and study the notion of min-wise independent families of permutations in the symmetric group, which are essential to the AltaVista web index software to detect near-duplicate documents. Expand
  • 820
  • 69
A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines
TLDR
We present a standardized, statistical way of measuring search engine coverage and overlap through random queries. Expand
  • 458
  • 48