Learn More
SUMMARY A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multi-pattern matching problem are discussed. We argue that, in addition to previous applications that required such search,(More)
Errors The string-matching problem is a very common problem. We are searching for a string P = PtP2.. "Pro inside a large text file T = tlt2...t., both sequences of characters from a finite character set Z. The characters may be English characters in a text file, DNA base pairs, lines of source code, angles between edges in polygons, machines or machine(More)
We present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time(More)
It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. In this paper we introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicat-ing, and accessing Internet information.
We present a new data structure, called the xed-queries tree, for the problem of nding all elements of a xed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisses the triangle inequality. We give an analysis of several performance(More)
Let A and B be two sequences of length M and N respectively, where without loss of generality N ≥ M, and let D be the length of a shortest edit script between them. A parameter related to D is the number of deletions in such a script, P = D/2 − (N − M)/2. We present an algorithm for finding a shortest edit distance of A and B whose worst case running time(More)
We describe two new algorithms for implementing barrier synchronization on a shared-memory multicomputer. Both algorithms are based on a method due to Brooks. We first improve Brooks' algorithm by introducing double buffering. Our dissemination algorithm replaces Brooks' communication pattern with an information dissemination algorithm described by Han and(More)