Learn More
We present Symphony, a novel protocol for maintaining distributed hash tables in a wide area n e t-work. The key idea is to arrange all participants along a ring and equip them with long distance contacts drawn from a family of harmonic distributions. Through simulation, we demonstrate that our construction is scalable, exible, stable in the presence of(More)
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not. In the(More)
We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude. We also discuss methods that couple(More)
We present <sc>SETS</sc>, an architecture for efficient <i>search</i> in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arrange participating sites in a <i>topic-segmented</i> overlay topology in which most connections are <i>short-distance</i>, connecting pairs of sites with similar(More)
This paper describes our ongoing work developing the Stanford Stream Data Manager (STREAM), a system for executing continuous queries over multiple continuous data streams. The STREAM system supports a declarative query language, and it copes with high data rates and query workloads by providing approximate answers when resources are limited. This paper(More)
Several peer-to-peer networks are based upon randomized graph topologies that permit efficient <sc>greedy</sc> routing, e. g., randomized hypercubes, randomized Chord, skip-graphs and constructions based upon small-world percolation networks. In each of these networks, a node has out-degree &#920;(log n), where n denotes the total number of nodes, and(More)
Routing topologies for distributed hashing in peer-to-peer networks are classified into two categories: deterministic and randomized. A general technique for constructing deterministic routing topologies is presented. Using this technique, classical parallel interconnection networks can be adapted to handle the dynamic nature of participants in peer-to-peer(More)
We present a low-cost, decentralized algorithm for ID management in distributed hash tables (DHTs) managed by a dynamic set of hosts. Each host is assigned an ID in the unit interval [0, 1). At any time, the set of IDs splits the interval into disjoint partitions. Hosts do not possess global knowledge of other IDs in the system. The challenge then is to(More)