Learn More
We present Symphony, a novel protocol for maintaining distributed hash tables in a wide area network. The key idea is to arrange all participants along a ring and equip them with long distance contacts drawn from a family of harmonic distributions. Through simulation, we demonstrate that our construction is scalable, exible, stable in the presence of(More)
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not. In the(More)
Several peer-to-peer networks are based upon randomized graph topologies that permit efficient <sc>greedy</sc> routing, e. g., randomized hypercubes, randomized Chord, skip-graphs and constructions based upon small-world percolation networks. In each of these networks, a node has out-degree &#920;(log n), where n denotes the total number of nodes, and(More)
We present <sc>SETS</sc>, an architecture for efficient <i>search</i> in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arrange participating sites in a <i>topic-segmented</i> overlay topology in which most connections are <i>short-distance</i>, connecting pairs of sites with similar(More)
We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude. We also discuss methods that couple(More)
In a recent paper [MRL98], we had described a general framework for single pass approximate quantile finding algorithms. This framework included several known algorithms as special cases. We had identified a new algorithm, within the framework, which had a significantly smaller requirement for main memory than other known algorithms. In this paper, we(More)
This paper describes our ongoing work developing the Stanford Stream Data Manager (STREAM), a system for executing continuous queries over multiple continuous data streams. The STREAM system supports a declarative query language, and it copes with high data rates and query workloads by providing approximate answers when resources are limited. This paper(More)
We present a low-cost, decentralized algorithm for ID management in distributed hash tables (DHTs) managed by a dynamic set of hosts. Each host is assigned an ID in the unit interval [0, 1). At any time, the set of IDs splits the interval into disjoint partitions. Hosts do not possess global knowledge of other IDs in the system. The challenge then is to(More)