Learn More
It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. In this paper we introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicat-ing, and accessing Internet information.
Errors The string-matching problem is a very common problem. We are searching for a string P = PtP2.. "Pro inside a large text file T = tlt2...t., both sequences of characters from a finite character set Z. The characters may be English characters in a text file, DNA base pairs, lines of source code, angles between edges in polygons, machines or machine(More)
We present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time(More)
We present a new data structure, called the xed-queries tree, for the problem of nding all elements of a xed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisses the triangle inequality. We give an analysis of several performance(More)
A new text compression scheme is presented in this article. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can be used. Instead, the <italic>pattern</italic> is modified;(More)
We present a new le system that combines name-based and content-based a c c ess to les at the same time. Our design allows both methods to be used at any time, thus preserving the beneets of both. Users can create their own name spaces based on queries, on explicit path names, or on any combination interleaved arbitrarily. All regular le operations such as(More)
We describe two new algorithms for implementing barrier synchronization on a shared-memory multicomputer. Both algorithms are based on a method due to Brooks. We first improve Brooks' algorithm by introducing double buffering. Our dissemination algorithm replaces Brooks' communication pattern with an information dissemination algorithm described by Han and(More)